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The  relational  data  model  has  been  a dominant  data 
model  in  the  past  two  decades.  Most  of  research  efforts  on 
database  machines  have  been  in  support  of  processing  of  data 
in  this  model.  However,  the  relational  model  captures  very 
limited  semantics  and  lacks  many  other  features  that  are 
necessary  to  model  complex  data  in  various  applications. 
Furthermore,  the  excessive  number  of  relational  join  and  set 
operations,  caused  by  normalization  of  relations,  puts  an 
inherent  bound  on  the  performance  of  a database  machine,  no 
matter  how  fast  a join  or  set  processor  can  be  designed.  On 
the  other  hand,  semantic  data  models  provide  abundant 
semantic  constructs  which  allow  users  to  define  data  and 
relationships  among  data  in  a natural  manner.  Unnecessary 
join  and  set  operations  required  to  relate  data,  such  as 
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those  in  relational  model,  can  be  avoided.  Also  in  recent 
years,  database  management  systems  based  on  these  data 
models  are  further  enhanced  with  increased  functionalities 
and  higher-level  data  manipulation  capabilities  by  being 
incorporated  into  an  object-oriented  paradigm.  In  this 
dissertation,  the  design  and  evaluation  of  an  Object  Flow 
Computer  (OFC)  is  presented.  The  OFC  is  used  to  directly 
support  high-level,  object-oriented  semantic  data  models.  A 
novel  two-phase  parallel  query  processing  strategy  (PQPS)  is 
designed  and  used  within  the  OFC.  A set  of  primitive 

operators  used  in  the  query  processing  strategy  is  also 
defined.  The  architecture  of  the  OFC  is  designed  to 
effectively  support  this  two-phase  PQPS.  By  integrating 
software  and  hardware  techniques,  the  OFC  achieves  a very 
significant  performance  improvement  over  conventional 
systems . 
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CHAPTER  1 
INTRODUCTION 

In  the  past  two  decades,  a considerable  amount  of 
research  exists  which  uses  hardware,  firmware  and  novel 
architectures  to  achieve  the  needed  efficiency  in 
implementing  database  management  functions  [HSI83d,  OZK86, 
SU88  ] . Despite  the  fact  that  much  progress  has  been  made, 
the  continuous  demand  for  increase  in  functionality  and 
higher  efficiency  in  database  management  systems  (DBMSs)  has 
diminished  the  value  of  past  efforts.  A database  computer 
that  has  a significant  performance  improvement  (one  or  two 
orders  of  improvement)  over  conventional  systems  is  still  a 
goal  rather  than  a reality.  The  reasons  for  the  past 
failure  to  achieve  the  goal  are  as  follows. 

First,  researchers  (including  ourselves)  have  been 
developing  database  computers  in  the  past  for  supporting  a 
relatively  primitive  data  model,  namely,  the  relational 
model.  The  relational  model  captures  very  limited  semantic 
constraints,  does  not  support  abstract  data  typing  and  the 
concept  of  inheritance,  does  not  allow  user-defined 
operations  and  knowledge  rules,  and  lacks  many  other 
features  provided  by  semantic  data  models  or  object-oriented 
data  models.  Furthermore,  the  normalization  process 
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required  in  the  relational  model  forces  data  and  their 
relationships  to  be  unnecessarily  scattered  in  a large 
number  of  relations.  To  relate  the  data  in  these  relations 
during  the  processing  of  a query,  a large  number  of 
relational  join  and  set  operations  need  to  be  performed.  No 
matter  how  database  computer  designers  improve  the 
performance  of  join  and  set  operations,  the  frequency  of 
performing  these  operations  places  a limit  on  the 
performance  improvement  of  their  database  computers. 

The  second  reason  is  as  follows.  Generally  speaking, 
query  processing  can  be  viewed  as  (1)  identifying  the 
qualified  data  from  a database,  and  (2)  processing  the 
qualified  data  to  produce  the  final  result.  We  observe  that 
these  are  indeed  two  distinct  phases  of  query  processing 
which  are  intermixed  in  conventional  processing  strategies. 
Consequently,  a large  amount  of  (sometimes  unnecessary)  data 
is  accessed  from  the  database  and  repeatedly  transmitted 
among  the  processors  during  the  identification  phase  of 
query  processing.  By  not  making  such  distinction, 
performance  again  suffers. 

Lastly,  most  of  the  existing  database  computers  use 
various  hardware  and  software  techniques  in  isolation;  a 
natural  tendency  for  a developing  field.  These  techniques 
have  not  been  fully  integrated  into  a single  system  to  take 
full  advantage  of  the  efficiencies  offered  by 
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special-purpose  hardware,  multiprocessor  architectures,  and 
parallel  processing  algorithms.  For  example,  the  advanced 
parallel  processing  technigues  such  as  query  decomposition, 
data  flow,  pipelining,  intermediate  result  sharing  among 
concurrent  queries  [W0N76,  BOR80,  JAR84 , KIM84,  FIS84, 
DEW86,  SU86 ] and  the  special-function  processors  [KUN80, 
TAN80 , BAN 8 3 , KIT84,  MIY84,  IWA84,  LEE87 ] have  not  been 
combined  and  used  to  their  full  advantages. 

This  dissertation  presents  the  design  and  evaluation  of 
an  Object  Flow  Computer  (OFC)  . The  main  features  of  this 
work  are  as  follows: 

1)  The  parallel  architecture  of  OFC  is  designed  to 
directly  support  an  object-oriented  semantic  association 
model  so  that  high-level  semantic  constructs  as  seen  by  the 
users  of  a large  knowledge  base  are  processed  by  hardware 
without  many  levels  of  transformations  to  a low-level 
representation . 

2)  A two-phase  parallel  query  processing  strategy 
(PQPS)  is  introduced.  During  the  first  phase,  objects  in 
the  form  of  object  identifiers  (OIDs)  flow  through  the 
system  in  a pipelined  fashion  to  select,  relate,  and 
identify  objects  in  various  involved  classes.  In  the  second 
phase,  the  OIDs  of  the  identified  objects  are  used  to 
process  their  attribute  values  to  produce  the  result.  This 
technique  results  in  avoiding  the  unnecessary  transfers  of  a 
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large  amount  of  data  from  secondary  storage  and  from 
processor  to  processor,  thus  reducing  or  eliminating  the  I/O 
bottleneck  found  in  large  database  systems. 

3)  Since  the  requirements  in  two  processing  phases 
are  different  from  each  other,  the  architecture  used  in  each 
phase  should  be  different.  in  our  design,  we  carefully 
evaluate  the  system's  performance  under  different  conditions 
to  find  out  the  appropriate  architecture  for  each  processing 
phase.  The  result  also  includes  the  consideration  of  total 
investment  cost  on  the  architecture.  This  is  very  different 
from  most  of  the  database  machine  designs  in  which  an 
arbitrary  type  of  interconnection  is  chosen  without  an 
evidence  to  show  the  necessity  of  using  it. 

4)  Instead  of  the  conventional  record-oriented  data 
organization,  a vertical  fragmentation  storage  model  is  used 
for  the  storage  of  data  of  a class.  This  storage  model  is 
similar  to  the  one  introduced  by  Copeland  [COP85,  KH088]  for 
the  relational  model.  However,  the  attributes  of  a relation 
in  their  design  is  fully  fragmented  indiscriminantly . In 
our  system,  the  fragmentation  strategy  is  guided  by  the 
semantics  of  the  data.  Along  with  the  two-phase  query 
processing  technique,  this  fragmentation  storage  model 
increases  the  concurrency  of  the  system  and  minimizes 
unnecessary  transfers  of  data. 
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5)  A set  of  primitive  operators  is  defined  for  the  OFC. 
Based  on  these  primitive  operators,  a high-level  database 
request  can  be  decomposed  and  executed  in  parallel. 
Furthermore,  these  OFC  operators  can  be  directly  implemented 
in  hardware  in  the  form  of  special  function  processors  using 
the  VLSI  technology. 

6)  Finally,  the  Object  Flow  Computer  combines  a number 
of  known  database  processing  techniques  such  as  query 
decomposition,  pipelining  mode  of  data  processing,  data  flow 
control  strategy,  and  hardware/ firmware  support  in  the  form 
of  special/general-purpose  processor.  In  this  manner,  the 
Object  Flow  Computer  offers  an  integrated  software/hardware 
solution  to  the  very  large  database  management  problem. 

This  dissertation  is  organized  as  follows.  Chapter  2 
gives  a survey  of  some  existing  research  works  that  are 
particularly  related  to  ours.  Chapter  3 discusses  the  need 
for  object-oriented  semantic  data  model  in  support  of 
advanced  database  application  areas.  An  example  model 
called  the  OSAM*  is  briefly  reviewed.  Chapter  4 presents 
the  general  architecture  of  the  Object  Flow  Computer.  The 
primitive  operators  of  this  computer  are  defined.  In 
Chapter  5,  the  use  of  the  Object  Flow  Computer  to  support 
the  Object-oriented  Semantic  Association  Model  (OSAM*)  is 
described  and  a two-phase  PQPS  is  presented.  In  Chapter  6, 
a comparison  of  the  two-phase  PQPS  with  a conventional 
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processing  strategy  is  performed  to  prove  the  superiority  of 
the  two-phase  strategy  under  various  conditions.  In  Chapter 
7,  a more  specific  OFC  architecture  with  some  options  for 
the  type  of  interconnection  among  processors  is  presented. 
A simulation  system  is  developed  to  simulate  the  concurrent 
query  processing  within  the  OFC  using  different 
interconnections.  The  simulation  results  and  explanation 
are  given  in  Chapter  8 , in  which  some  important  conclusions 
are  drawn  from  the  obtained  results.  Finally,  the 
concluding  remarks  and  future  research  possibilities  are 
presented  in  Chapter  9. 


CHAPTER  2 

SURVEY  OF  RELATED  WORKS 


The  design  and  development  of  computers  for  the 
efficient  implementation  of  object-oriented  languages  such 
as  SmallTalk  has  been  a popular  topic  of  research  for  a 
number  of  years.  Several  well-known  machines  has  been 
designed  and  implemented.  Some  example  systems  are:  Xerox's 
Dorado  [PIE83,  DEU84],  the  system  SOAR  from  the  University 
of  California,  Berkeley  [UNG84 , SAM86 ] , the  Caltech  Object 
Machine  [DAL85],  and  the  University  of  Toronto's  SWAMP 
[LEW86] . These  systems  are  designed  to  directly  support  the 
low-level  operations  needed  for  object  manipulations  such  as 
method  lookup,  context  allocation  and  deallocation,  message 
handling,  and  garbage  collection.  The  research  proposed 
here  differs  from  these  systems  in  the  following  two  ways. 
First,  the  proposed  OFC  is  designed  to  support  high-level 
object  manipulation  operations  on  an  object-oriented 
semantic  model  which  has  an  object  orientation  quite 
different  from  that  of  the  programming  language  SmallTalk. 
Secondly , the  OFC  deals  with  not  only  object— level 
manipulations  but  also  class-level  manipulations.  The 
latter  type  of  manipulations  represents  a property  of 
object-oriented  database  systems  that  is  lacking  in  the 
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existing  object-oriented  programming  languages.  Due  to 
these  differences,  we  shall  not  include  these  and  other 
related  systems  in  our  survey.  Instead,  we  shall 
concentrate  on  multi-processor  systems  designed  for  database 
applications. 

In  recent  years,  a number  of  multiprocessor  systems 
been  introduced  which  make  use  of  conventional 
computers  and/or  special-purpose  processors  in  various  novel 
architectures  to  solve  different  non-numeric  processing 
problems.  These  systems  can  be  broadly  classified  into  two 
categories:  multiprocessor  systems  with  replication  of 

functions  and  multiprocessor  systems  with  distribution  of 
functions.  in  our  proposed  system  (to  be  described  in 

Chapter  4),  each  special  function  unit  in  a processing  node 
performs  different  primitive  database  operation  and 
^■'■^^eren^:  processing  nodes  may  have  a number  of  these 

special  function  units.  As  a result,  our  proposed  system 
has  some  characteristics  of  both  classes  of  multiprocessor 
systems.  In  this  section,  we  will  survey  some  recent  works 
on  multiprocessor-based  database  machines  that  are  parti- 
cularly relevant  to  the  proposed  research. 

DIRECT  [ DEW79 , BOR82  ] is  a backend  multiprocessor 
database  machine  with  a MIMD  architecture  that  supports  both 
intra-  and  inter-query  concurrency.  The  query  processors  in 
DIRECT  are  dynamically  assigned  to  queries. 


Each  of  the 
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participating  relations  are  divided  into  pages,  which  are 
the  data  granule  that  moves  among  processors  and  shared 
memory  modules.  The  data  flow  mechanism  introduced  in  this 
system  was  not  fully  developed  until  later  works  by  the  same 
group  [DEW86] . In  DIRECT,  many  control  activities  and  data 
movements  are  required  in  query  execution.  The  result  is  an 
excessive  amount  of  overhead.  Other  limitations  include  a 
limited  I/O  bandwidth  and  the  use  of  conventional 
general-purpose  processors  to  perform  non-numerical 
operations  which  are  not  suitable  for  convention  processors. 
Unlike  DIRECT,  a highly  parallel  data  flow  scheme  is 
exploited  in  our  proposed  system.  Furthermore,  both 
general-purpose  and  special-purpose  processors  are  utilized. 

The  data  flow  concept  introduced  in  [ DEN74 , DEN75, 
DEN80]  has  been  applied  to  processing  non-numeric  data 
[BOR80,  B0R81,  B0R82,  BIT83].  Some  recent  multiprocessor 
systems  that  employ  pipeline  processing  schemes  include 
GRACE  [KIT83 , KIT84,  KIT85]  and  GAMMA  [DEW8  6 , GER8  6 ] . In 
both  machines,  hash-based  processing  techniques  are  used  for 
partitioning  relations.  By  hashing  relations  into 
partitions  before  executing  a binary  operation  (such  as 
join),  irrelevant  tuples  that  will  not  produce  any  result  in 
the  operation  can  be  eliminated.  However,  the  partitions 
may  overflow  during  hashing.  This  problem  is  handled  in 
GRACE  by  first  hashing  a relation  into  a large  number  of 


10 


partitions.  Then,  some  of  these  partitions  are  combined 
into  a larger  partition  with  a volume  less  than  the  capacity 
3.  processor.  GAMMA  overcomes  this  problem  by  resplitting 
the  overflowing  partitions  using  an  alternative  hash 
function.  Therefore,  some  additional  overhead  time  is 
required  to  regulate  the  partition  size.  Both  strategies 
could  fail  in  the  case  when  all  the  tuples  have  identical 
target  attribute  values,  which  may  never  be  partitioned  by 
any  hash  function. 

A recent  work  [VAL87 ] proposed  a data  structure  called 
a join  index  to  improve  the  performance  of  joins  in  the 
context  of  complex  queries.  Join  index  is  a pre- joined 
relation,  based  on  the  primary  keys,  and  is  stored 
separately  from  the  operand  relations.  It  was  assumed  that 
the  join  operations  are  performed  most  frequently  on  the 
primary  key  and  the  join  selectivity  is  high.  Joins  on  the 
other  attributes  are  performed  less  frequently  and  the  join 
selectivity  is  low.  Therefore,  it  is  worthwhile  to  index 
the  primary  key  attribute  and  perform  a pre- join  with  other 
relations.  This  work  demonstrated  that  by  using  join 
indices,  performance  of  joins  can  be  greatly  improved.  The 
pre-join  concept  is  exploited  fully  in  OSAM*  [SU83a,  SU89], 
an  object-oriented  semantic  data/knowledge  model  on  which 
our  proposed  work  is  based.  In  this  model,  relationship 
among  objects  are  explicitly  defined  and  the  corresponding 
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data  is  stored  as  "pre-join"  table. 

MIRDM  [QAD85 ] is  a system  that  employs  a multi-level 
indexing/clustering  techniques.  The  indexing  scheme  is 
supported  by  hardware  to  reduce  the  amount  of  data  to  be 
processed  in  retrievals  and  thus  improving  the  overall 
system  performance. 

One  serious  drawback  of  conventional  database  machines 
is  that  records  of  files,  without  being  vertically 
partitioned,  are  retrieved  from  disks  and  processed  in  their 
entirety.  In  doing  so,  some  attribute  columns  that  are  not 
immediately  needed  are  still  retrieved  and  transferred  from 
one  processor  to  another.  This  increases  the  amount  of  I/O 
operations  and  the  load  of  communication  channels,  thus 
increasing  the  seriousness  of  the  I/O  bottleneck.  For  some 
database  operations  such  as  join,  increasing  the  number  of 
data  pages  often  results  in  a great  increase  in  the 
processing  time  required  to  complete  the  operation.  Software 
and  hardware  techniques  that  seek  to  alleviate  this  problem 
by  adopting  an  attribute-based  internal  data  schema  are 
described  below. 

DEC  [BAN79]  and  MDBS  [HSI83a,  HSI83b,  HSI83c]  are 
database  machines  that  employ  an  attribute-based  data 
model.  A clustering  technique  is  used  to  group  records  into 
clusters.  This  technique  especially  benefits  retrieval, 
deletion,  and  update  types  of  queries  because  less  accesses 
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to  data  storage  are  required.  However,  relations  are  not 
physically  partitioned  into  vertical  fragments  in  the  above 
systems  (DBC,  MDBS,  MIRDM) . It  is  our  opinion,  also 
supported  by  other  researches  [VAL87 , HAM79 , NAV84 , COR87], 
that  if  files  are  vertically  fragmented  to  some  extent,  the 
system  performance  can  be  improved  due  to  the  fact  that 
transfer  of  unnecessary  data  is  reduced. 

A general  discussion  of  pros  and  cons  on  a decomposed 
storage  model  (DSM)  is  proposed  [COP85,  KH088].  In  this 
model,  a relation  is  fully  decomposed.  An  attribute  is 
stored  independently  with  its  surrogate  as  a binary 
relation.  The  DSM  maintains  two  copies  of  each  binary 
relation,  one  clustered  on  the  surrogates  and  the  other 
clustered  on  the  values.  Although  the  simplicity  and 
generality  of  this  scheme  is  desirable,  its  storage 
requirement  is  significantly  increased  and  updating  the 
binary  relations  can  be  a serious  problem.  In  a very  large 
database  environment,  the  complexity  of  data  update  is 
increased  exponentially.  Different  from  the  DSM  where  the 
data  is  fully  decomposed  indiscriminately,  our  partial 
partitioning  strategy  is  guided  by  the  semantics  of  the 
data. 

A query  processing  strategy  based  on  the  DSM  is 
proposed  by  the  same  group  [KH087,  KH088].  This  processing 
strategy  contains  four  phases:  select  phase,  pivot  phase. 
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value  materialization  phase,  and  composition  phase. 
Selection  operations  are  performed  in  the  select  phase. 
Joins  are  performed  in  the  pivot  phase.  Output  of  the  pivot 
phase  is  a set  of  binary  relations.  Each  of  these  binary 
relations  has  two  surrogate  fields,  one  of  which  is  the 
pivot  surrogate  which  is  used  to  relate  the  output 
relations  of  the  pivot  phase.  Also,  it  is  used  to  construct 
the  final  result  after  those  binary  relations  are 
materialized.  Similar  to  their  strategy,  our  two-phase 
processing  strategy  also  eliminates  a large  amount  of 
unnecessary  data  transfer.  However,  our  processing  strategy 
is  applied  to  an  0-0  semantic  data  model  rather  than  to  the 
(normalized)  relational  model.  As  a result,  it  eliminates 
much  of  the  unnecessary  join  operations. 

Other  database  computers  which  employ  some  form  of 
vertical  data-partitioning  concept  include  DBMAC  [MIS82, 

MIS83 ] , a data-flow  database  machine  [TAN80 , TAN8  3 ] , and 
DELTA  [KAK85] . 

To  summarize  the  brief  survey  given  above,  we  would 
like  to  point  out  that,  although  many  software  and  hardware 
techniques  have  been  introduced  by  the  existing  work  on 
database  machines,  they  have  been  used  in  isolation  in 
separate  systems.  it  is  necessary  to  integrate  these 
techniques  with  some  new  ones  to  be  introduced  in  our 
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research  to  achieve  the  speed  needed  in  a highly  parallel 
system  designed  for  managing  vast  databases. 


CHAPTER  3 

OBJECT-ORIENTED  SEMANTIC  DATA  MODELS 


The  continuous  demand  for  increase  in  functionality  and 
higher  efficiency  in  database  management  has  motivated 
several  important  areas  of  research  and  development  in 
recent  years.  To  satisfy  the  demand  for  increased 
functionality,  several  so-called  semantic  data  models  and 
high-level  data  languages  have  been  introduced.  The 
database  management  systems  (DBMSs)  using  these  models  can 
operate  more  intelligently  and  provide  the  users  with  more 
functionalities.  Another  trend  toward  increased 
functionality  is  to  take  the  object-oriented  approach  to 
database  and  DBMS  design.  The  object-oriented  paradigm 
introduced  by  artificial  intelligence  and  programming 
language  communities  possesses  many  desirable  features  for 
capturing  the  structural  properties  and  behavioral  aspects 
of  objects.  It  is  natural  to  combine  the  features  of 
semantic  modeling  and  the  object-oriented  paradigm  into  an 
object-oriented  semantic  model  for  building  future  DBMSs. 

In  the  following,  we  first  briefly  discuss  the 
important  features  of  semantic  data  model  and 
object-oriented  paradigm.  Then,  an  example  model,  called 
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the  Object-oriented  Semantic  Association  Model  (OSAM*) , is 
briefly  overviewed. 

3 . 1 Semantic  Data  Model  and  Object-Oriented  Paradicrm 

The  need  for  data  models  that  are  rich  in  semantics  has 
become  apparent  in  recent  years  as  database  management 
systems  are  used  for  supporting  engineering,  scientific, 
statistical , and  military  applications.  In  these  areas, 
databases  are  more  complex  both  structurally  and 
semantically  and  are  larger  in  size.  The  limited  semantic 
properties  captured  by  data  models  such  as  the  relational 
model  forces  the  users  to  do  more  in  application  programs  in 
order,  for  example,  to  maintain  the  semantic  constraints  and 
integrity  of  a database.  Whatever  semantics  not  captured  by 
the  model  are  represented  and  enforced  by  the  users  in  their 
application  programs.  Furthermore,  since  the  users' 
programs  directly  control  the  retrieval  and  manipulation  of 
data  in  a database,  the  DBMS  has  no  opportunity  to  optimize 
queries  to  achieve  the  needed  efficiency. 

The  limitations  in  data  modeling  capability  found  in 
the  existing  systems  have  motivated  many  researchers  in 
developing  semantic  models  [CHE76,  SMI77,  MYL78 , BR081, 
SU83a,  L0086] . A semantic  model  is  a data  model  that  is 
rich  in  semantics  and  contains  a number  of  general  modeling 
constructs  for  describing  the  semantic  properties  of 
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databases  explicitly.  The  main  advantages  of  a semantic 
model  over  the  conventional  data  models  are  twofold.  First, 
database  users,  designers,  and  custodians  can  benefit  from  a 
more  explicit,  complete,  and  direct  description  of  the 
semantic  properties  of  a database  in  its  creation,  use,  and 
maintenance.  Secondly,  a database  management  system  that 
uses  a semantic  model  as  its  underlying  data  model  can  make 
use  of  the  defined  properties  to  1)  unambiguously  interpret 
users'  gueries,  which  can  be  stated  in  a simple  way  without 
restating  the  properties  which  have  already  defined  in  the 
schema,  2)  enforce  rules  and  constraints,  and  3)  optimize 
the  queries,  thus  providing  ease  of  use,  functionality  and 
efficiency. 

Also  in  recent  years,  the  importance  of  the 
object-oriented  (0-0)  approach  to  programming  languages, 
systems,  and  data  modeling  has  become  apparent.  Several  of 
the  proposed  semantic  models  have  taken  the  object-oriented 
approach  to  data  modeling  [COP84,  BAT84 , ZD086,  SU89].  The 
reason  is  that  the  object-oriented  paradigm  offers  several 
features  that  meet  the  data  modeling  requirements  of  many 
database  application  areas.  For  example,  the  data 
abstraction  feature  allows  complex  data  types  to  be  defined 
in  terms  of  either  system-  or  user-defined  data  types.  This 
feature  is  important  for  modeling  complex  objects  found  in 
CAD/ CAM  databases.  The  encapsulation  feature  of  0-0  systems 
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allows  the  separation  of  the  object  class  specification  from 
object  class  implementation.  This  is  ideal  for  the  use  and 
maintenance  of  the  database  system  since  (1)  the  users  do 
not  have  to  know  the  implementation  details  to  use  the 
database  system,  and  (2)  modification  to  the  implementation 
will  not  affect  the  specification  and  thus  the  applications. 
Another  feature  of  the  0-0  paradigm  is  the  inheritance  of 
instance  variables  and  operations.  A subclass  (e.g.,  cars) 
can  inherit  the  instance  variables  and  operations  defined 
for  its  superclass  (e.g.,  vehicles).  This  property  is 
important  to  database  systems  since  the  inheritance  of 
structural  properties,  operations,  and  rules  can  greatly 
simplify  the  database  definition  task  and  promote  the 
reusability  of  code.  Additionally,  the  concept  of  treating 
object  classes  as  objects  in  the  0-0  paradigm  allows  the 
uniform  treatment  of  data  and  metadata  in  a DBMS. 

Although  the  OFC  can  be  used  to  support  general  file 
sfrucfures  and  other  conventional  data  models  such  as  the 
relational  model,  it  is  designed  primarily  for  the  efficient 
support  of  object-oriented  semantic  data  models.  We  will 
briefly  describe  such  a model  in  the  next  section. 

3 • 2 The  Object-Oriented  Semantic  Association  Model  (OSAM*^ 

The  Object-oriented  Semantic  Association  Model  (OSAM*) 
[SU89]  is  a semantic  data/knowledge  model  that  possesses  the 
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concepts  and  features  described  above.  Objects  in  the 
application  worlds  are  modeled  as  OSAM*  objects  and  grouped 
together  into  classes  based  on  some  common  semantic 
properties.  An  OSAM*  class  consists  of  a specification 
part,  an  implementation  part,  and  an  extension  part.  The 
specification  of  a class  consists  of  1)  its  associations 
with  other  classes,  2)  operations  of  the  class  and  its 
instances,  and  3)  rules  and  constraints  that  govern  the 
class  and  its  instances.  A class  specification,  therefore, 
captures  the  structural  and  behavioral  semantics  common  to  a 
set  of  objects.  The  implementation  part  consists  of 
procedures  that  implement  specified  operations  and  rules  in 
the  form  of  methods.  The  class  extension  is  the  set  of 
object  instances  contained  in  the  class. 

The  expressive  power  of  OSAM*  lies  in  its  capability  to 
use  various  semantic  association  types  and  data  constructors 
in  a nested  and/or  recursive  fashion  to  model 
application-world  objects  and  semantic  relationships  among 
these  objects.  In  the  core  OSAM*  model,  there  are  five 
semantic  association  types:  aggregation  (A)  , generalization 
(G)  , interaction  (I)  , composition  (C)  , and  cross-product 
(X)  . An  example  shown  in  Figure  3.1  is  used  to  illustrate 
some  of  the  key  OSAM*  concepts,  particularly  the  A,  G,  and  I 
association  types.  An  educational  database  is  used  here 
mainly  because  of  the  ease  of  explanation  and  understanding. 
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Shown  in  Figure  3.1  is  the  educational  database  modeled 
by  using  OSAM*  in  the  form  of  a semantic  diagram 
(S-diagram) . An  S-diagram  is  an  OSAM*  graphical 
representation  of  classes  and  their  associations  by  means  of 
a network  of  labelled  nodes  and  edges.  Each  rectangular 
node  in  the  S-diagram  represents  an  entity  class  (E-class) , 
which  models  objects  in  the  application  world  that  are 
accessed  independently.  A circular  node  represents  a domain 
class  (D— class) , which  models  objects  that  primarily  serve 
as  descriptive  data  of  some  other  object.  A node  that 
represents  a defined  class  is  connected  to  its  constituent 
classes  via  directed,  labeled  edges.  Edges  are  grouped 
together  if  they  enter  into  the  same  association  with  the 
defined  class.  They  are  labeled  by  a semantic  association 
type  (A,  I,  G,  C,  or  X) . 

Edges  that  are  labeled  by  A and  I represent  the 
attributes  of  a defined  class.  The  constituent  classes  to 
which  these  edges  point  are  the  domains  from  which  the 
attributes  draw  their  values.  For  example,  the  E-class 
COURSE  has  an  aggregation  (A)  association  with  two 
constituent  D-classes  (C#  and  title)  and  a constituent 
E-class  (DEPARTMENT) . This  association  defines  the  three 
attributes  Department,  C#,  and  title  corresponding  to  the 
three  constituent  classes  of  the  association.  An  attribute 
has  the  same  name  as  its  domain  unless  specified  otherwise. 
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In  the  latter  case,  the  edge  is  explicitly  named  in  the 
^“diagram,  such  as  the  edge  named  "major"  in  the  class 
STUDENT.  Note  that  the  aggregation  association  is  similar 
to  the  concept  of  "instance  variables"  in  a conventional  0-0 
system. 

An  attribute  may  be  designated  as  the  user- spec if ied 
identifier  of  an  E-class.  Its  values  uniquely  identify  the 
objects  of  the  E-class.  For  example,  the  dash-marks  on  the 
edge  C#  of  the  class  COURSE  designates  that  attribute  as  the 
user-specified  identifier  of  COURSE. 

The  E-class  PERSON  has  two  types  of  associations: 
generalization  (G)  and  aggregation  (A)  . The  generalization 
association  is  similar  to  the  superclass-subclass  concept  of 
object-oriented  systems.  in  this  case,  the  class  PERSON  is 
a generalization  of  the  classes  TEACHER  and  STUDENT.  In 
other  words,  all  teachers  are  persons  and  all  students  are 
also  persons.  The  set  intersection  (SI)  between  its 
constituent  classes  TEACHER  and  STUDENT  specifies  that  a 
student  may  be  a teacher  and  vice-versa. 

The  aggregation  association  of  the  class  PERSON  models 
the  semantics  that  the  attributes  SS#  and  name  (having  the 
same  names  as  their  domains)  characterize  the  objects  of 
PERSON.  Since  a teacher  is  also  a person  in  the  application 
domain,  an  instance  of  TEACHER  inherits  the  two  attributes 
of  the  more  generic  class  PERSON. 


Additionally,  the 
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teachers  themselves  have  the  attribute  degree  that  is 
specific  to  teachers.  Similarly,  an  instance  of  STUDENT  has 
the  attributes  SS#  and  name  inherited  from  PERSON,  in 
addition  to  its  own  attributes  major  and  GPA.  In  general, 
instances  of  a subclass  in  an  OSAM*  database  inherit  all  the 
associations,  operations,  and  rules  from  its  superclass (es) . 

Note  that  TEACHER  itself  is  a generalization  of  the 
classes  FACULTY  and  TA.  Additionally,  TA  is  also  a subclass 
of  GRAD.  In  general,  a class  can  be  a superclass  of  many 
classes  and  a subclass  of  many  other  classes,  resulting  in  a 
generalization  (G-)  lattice,  as  illustrated  by  the  G-lattice 
in  Figure  3.1.  Furthermore,  the  set  exclusion  (SX)  between 
the  class  GRAD  and  the  class  UNDERGRAD  (also  between  FACULTY 
and  TA)  specifies  that  an  object  in  GRAD  cannot  be  in 
UNDERGRAD  and  vice-versa.  in  other  words,  in  this 
application  world  (database) , a graduate  student  cannot  be, 
at  the  same  time,  an  undergraduate  student  and  vice  versa. 

The  E-class  ADVISING  has  an  interaction  (I)  association 
and  an  aggregation  (A)  association.  The  interaction 
association  models  the  concept  that  an  object  in  this  class 
represents  an  interaction  between  or  among  some  objects  in 
its  constituent  classes.  in  this  case,  each  instance  of 
ADVISING  represents  the  fact  that  a faculty  member  is 
advising  some  graduate  students.  This  interaction  is 
described  by  the  attribute  startdate  (aggregation 
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association).  The  one-to-many  (or  l:n)  constraint  between 
the  constituent  classes  is  depicted  on  the  arc.  The 
interaction  association  is  similar  to  the  "relationship" 
concept  in  [CHE76],  but  has  no  counterpart  in  a conventional 
0-0  system. 

The  above  description  only  provides  the  reader  with  an 
overview  of  the  OSAM*  that  are  related  to  the  following 
presentation.  For  a more  detailed  and  formal  treatment  of 
the  semantic  associations  of  OSAM*,  refer  to  [SU89]. 
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Figure  3.1.  The  S-diagram  of  an  educational  database 
modeled  by  the  OSAM* . 


CHAPTER  4 

THE  OBJECT  FLOW  COMPUTER 


In  data  flow  processing,  data  originates  from  the  base 
files  and  flows  to  processors.  Under  asynchronous  control, 
processing  begins  at  each  processor  when  all  the  required 
data  arrives  at  the  processor.  The  result  then  flows  to  the 
next  processor (s)  as  the  input  to  that  processor.  In  this 
manner,  the  processing  continues  until  the  final  result  is 
produced.  0b~] ect — flow  processing  is  simply  a special  case 
of  data  flow  processing  in  which  the  primary  flow  through 
the  system  is  the  flow  of  "objects."  In  the  Object  Flow 
Computer  (OFC) , objects  are  represented  in  the  form  of 
object  identifiers  (OIDs) , which  originate  from  base  classes 
and  flow  to  processing  nodes  (PNs) . Processing  begins  at 
each  PN  when  all  the  required  objects  and/or  data  arrive  at 
the  PN.  The  produced  objects  and/or  data  then  flow  to  the 
next  PN  for  the  next  stage  processing.  In  this  manner,  the 
processing  continues  until  the  result  is  produced. 

In  this  chapter,  the  architectural  requirements  of  an 
OFC  to  support  the  object  flow  processing  is  presented. 
Within  the  OFC,  an  integrated  hardware/software  approach  to 
solving  the  very  large  non— numeric  processing  problem  is 
employed  to  achieve  significant  database  processing 
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improvement  in  a cost-effective  manner.  In  Section  4.1,  we 
present  the  general  OFC  system  architecture  and  the 
requirements  of  the  architecture.  In  Section  4.2,  the 
architecture  of  a processing  node  within  the  OFC  is 
described.  Finally,  a set  of  OFC  operators  is  defined  in 
Section  4.3. 


4 • 1 General  OFC  Architecture 

A general  OFC  architecture  is  shown  in  Figure  4.1.  It 
provides  a general-purpose  computing  environment  which  is 
controlled  by  the  control  computer.  The  database  needs  are 
supported  by  a network  of  processing  nodes  (PNs) . Each  PN 
contains  a part  of  the  database  (objects  with  the  associated 
methods)  and  a set  of  processors  to  process  the  data.  The 
mapping  of  the  database  onto  the  network  of  PNs  is  based  on 
the  semantics  specified  in  the  database  schema. 

The  PNs  are  connected  through  an  interconnection 
network.  The  general  requirements  of  the  interconnection 
network  are  as  follows:  (1)  it  has  to  facilitate  the  mapping 
of  the  database  schema  (i.e.,  classes  and  their 
associations)  onto  the  OFC  architecture,  and  (2)  it  should 
be  able  to  support  efficient  processing  of  the  OFC  two-phase 
processing  strategy  (to  be  described  later  in  Chapter  5)  . 
The  actual  design  of  the  interconnection  network  will  be 
discussed  in  Chapter  7 and  Chapter  8. 
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External  users  (which  can  be  human  users  or  processes 
running  locally  or  remotely)  make  database  requests  to  a 
DBMS  supported  by  the  OFC.  The  requests  are  handled  by  a 
Control  Computer  (CC).  The  database  requests  are 
translated,  modified,  and  optimized  in  the  CC  into  a set  of 
optimized  OFC  query  trees  called  object  flow  trees.  An 
object  flow  tree  specifies  the  required  OFC  and  user-defined 
operations  and  the  flow  of  the  execution  of  these 
operations.  The  object  flow  trees  are  then  mapped  onto  the 
PNs  for  object  flow  execution. 

4 • 2 Architecture  of  a Processing  Node 

Physically,  each  PN  consists  of  a general-purpose 
processor,  zero  or  more  special-purpose  co— processors , and 
the  associated  memory  devices.  Logically,  each  PN  consists 
of  a local  controller  (LC)  and  a set  of  special  function 
units  (SFUs)  , as  shown  in  Figure  4.2.  Each  SFU  implements 
an  OFC  operator.  (The  requirements  of  the  OFC  operators 
will  be  discussed  in  Section  4.3.)  Each  SFU  can  be 
implemented  in  software  (general-purpose  processor)  or  as  a 
function  within  a special-purpose  co-processor,  depending 
on  their  performance  requirements.  Data  within  a PN  is 
staged  into  the  main  memory  from  the  secondary  storage  and 
forms  a pipeline  of  data  blocks. 


The  processing  of  data  by 
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one  or  more  associated  processors  is  triggered  by  objects 
and/or  data  arriving  at  the  processors. 

The  internal  storage  structure  of  data  is  in  the  form 
of  generalized  relations  (G— relations)  , as  shown  in  Figure 
4.3(a).  Structurally,  a G-relation  is  an  un-normalized 
relation  in  which  an  attribute  of  the  G-relation  does  not 
have  to  be  atomic  and  can  itself  be  a G-relation. 
Semantically,  the  G-relation  representation  captures  the 
required  information  to  support  an  object-oriented  semantic 
data  model:  object  identity,  description  of  an  object 

(attributes  or  instance  variables)  , and  relationship 
(association)  of  an  object  to  other  objects. 

Correspondingly,  the  attributes  of  a G-relation  in  the  OFC 
can  be  classified  as  follows:  an  object  identifier  (OID) , a 
set  of  descriptive  attributes,  and  a set  of  association 
attributes.  An  OID  is  a system-generated  object  identifier 
that  is  assigned  to  an  object.  It  is  used  to  uniquely 
identify  an  object  within  the  system  and  is  assumed  to  be  in 
order  within  a G-relation.  The  descriptive  attributes 
contain  data  that  describes  the  object,  and  the  association 
tributes  contain  information  concerning  the  object's 
association  with  other  objects. 

In  the  OFC,  the  data  in  a G-relation  is  vertically 
partitioned  and  stored  in  a vertical  fragmentation  storage 
model,  as  shown  in  Figure  4.3(b).  This  storage  model  is 
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similar  to  the  one  introduced  in  DSM  [COP85,  KH088]  for  the 
relational  model.  However,  the  attributes  of  a relation  in 
their  model  are  fully  decomposed,  i.e.,  each  attribute  is 
stored  independently  with  its  surrogate  as  a binary 
relation.  In  the  OFC,  the  fragmentation  strategy  is  guided 
by  the  semantics  of  the  data.  More  specifically,  data  in  a 
G-relation  is  partitioned  as  a vertical  fragment  of 
descriptive  attributes  and  a vertical  fragment  of 
association  attributes.  The  corresponding  OID  is  stored 
with  each  fragmented  data  and  is  used  to  identify  and  relate 
the  data  of  an  object  which  has  been  fragmented. 

Figure  4.4  shows  the  G-relations  of  some  of  the 
E-classes  in  Figure  3.1.  As  an  example,  the  SECTION 
G-relation  has  a unique  identifier  Sec_OID,  a set  of 
descriptive  attributes  (s#,  r# , textbook),  and  a set  of 
association  attributes  (Tea_OID,  Stud_OID,  Cou_OID) . This 
^“relation  is  stored  internally  as  two  vertical  fragments 
D_SECTION(Sec_OID,  s#,  r#,  textbook)  and  A_SECTION (Sec_OID, 
Tea_OID,  Stud_OID,  Cou_OID) . The  Sec_OID  is  stored  in  a 
sorted  order  in  both  fragments.  Note  that  some  of  the 
G-relations  may  not  have  both  types  of  data.  For  example, 
PERSON  has  only  descriptive  attributes  (SS#,  name),  and  GRAD 
has  neither  of  them.  For  these  G-relations,  partition  is 
not  necessary. 
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4-3  Primitive  Operators  of  the  Object  Flow  Computer 

As  described  above,  a processing  node  logically 
consists  of  a vertical  data  fragment  and  its  associated  set 
of  SFUs,  each  of  which  implementing  an  OFC  operator.  As 
shown  in  Figure  4.5,  each  SFU  (represented  by  a circle)  has 
an  output  pipe  and  one  or  more  input  pipes.  The  light 
arrows  represent  the  controlling  input  pipes  and  the  dark 
arrows  represent  the  pipes  to  be  controlled.  The  SFU  is 
activated  when  the  required  blocks  of  objects  and/or 
(object)  data  have  arrived  and  been  staged  into  the  input 
pipes.  Results  are  transferred  out  of  the  SFUs  output  pipe 
when  a full  block  of  output  objects  or  data  has  been 
produced.  In  the  OFC,  "object"  is  defined  to  be  and 
represented  by  a system— wide  unique  object  identifier,  OID. 
And  data  is  defined  to  be  and  represented  by  an  ordered  pair 
(OID,v)  , where  v is  the  value(s)  of  an  (or  a set  of) 
attribute (s)  ; and  OID  is  the  associated  object  identifier. 
The  braces,  ( },  represent  a set  or  block  of  objects  or 
data. 

A number  of  primitive  database  operations  are 
distinguished  in  the  OFC.  These  operations  are  introduced 
to  account  for  the  fact  that  the  input  data  to  a processor 
may  have  some  useful  characteristics  (e.g.,  one  of  the  input 
is  in  sorted  order) . Thus,  different  algorithms  can  be 
employed  to  achieve  better  efficiency  and  can  be  implemented 
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in  hardware  as  distinct  operators.  The  operators  in  the  OFC 
are  described  below: 

Selector — (SL)  . The  selector,  illustrated  in  Figure 
4.5(a),  has  one  input  pipe,  holding  blocks  of  data  which 
consist  of  the  pairs  (OID,v) . This  operator  has  a 
selection  condition,  C,  specified  on  the  attributes  in  v. 
If  a pair  (OID,v)  of  the  input  data  satisfies  the  selection 
condition,  then  the  associated  object  (represented  by  the 
OID)  is  included  in  the  output.  Thus,  the  output  of  the 
selector  consists  of  blocks  of  objects  whose  v values  have 
satisfied  the  selection  condition.  Alternatively,  the 
a^r^-kute  values  v will  also  be  included  in  the  output.  The 
selector  is  an  OID  order-preserving  operator.  That  is,  the 
output  has  the  same  OID  order  as  the  input. 

Merge-Selector — (MS_L.  As  shown  in  Figure  4.5(b),  the 
merge-selector  has  an  input  pipe  holding  blocks  of  data 
(OID, v)  and  an  input  pipe  holding  blocks  of  objects,  oip 1 . 
Both  OID  and  OID'  are  ordered  and  unique,  as  represented  by 
the  underline  symbol.  Processing  begins  when  both  pipes  are 
full.  The  function  of  the  MS  operator  is  to  select  those 
ordered  pairs  (OID,v)  whose  OID  value  is  contained  in 
{OID!}.  Since  both  inputs  are  in  OID  order,  a simple  merge 
process  is  used.  The  merge-selector  is  an  OID  order- 
preserving operator. 
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Sorter — (SRI.  The  sorter,  shown  in  Figure  4.5(c), 

performs  the  normal  sort  operation  on  an  input  data  stream 
over  a specified  attribute  (s)  S,  which  can  be  OID  or  a 
subset  of  v and  produces  a sorted  output  stream.  Obviously, 
the  sorting  operation  over  an  attribute  will  disrupt  the 
order  of  the  identifiers  in  input  data  stream.  Therefore, 
the  sorter  is  not  an  OID  order-preserving  operator. 

Merge-Sorter — (MSR) . The  merge-sorter,  shown  in  Figure 

4.5(d) , performs  the  normal  merge-sort  operation  on  two 
input  streams  of  sorted  data  and  produces  a sorted  output 
stream.  The  merge-sorter  is  OID  order  preserving  with 
respect  to  the  input  OID  order. 

— CJN)  ■ This  is  the  regular  join  operation.  As 

shown  in  Figure  4.5(e),  the  join  operator  performs  a 
relational  join  operation  on  the  two  streams  of  data 
contained  in  the  two  input  pipes.  The  join  operation  is 
based  on  the  join  condition,  J,  which  can  be  defined  over 
the  identifiers  (e.g.,  OID1  op  0ID2)  or  over  the  attributes 
(e.g.,  a op  b)  , where  "op"  is  any  of  the  comparison 
operators.  The  output  is  the  result  of  the  join  operation 
in  the  form  of  blocks  of  (0ID1 ' , 0ID2  • , a,  b)  or  some 

projection  of  it.  The  join  operator  is  not  OID  order 
preserving. 

Restricted-Join (RJ)  . A restricted- j oin  operator 

performs  the  same  operation  as  regular  join  (JN)  except  that 
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the  join  condition  is  always  defined  over  the  OXD 
attributes.  Also,  one  of  its  two  input  OIDs  on  which  the 
join  operation  is  performed  is  ordered,  as  represented  by 
the  underlined  0ID1  shown  in  Figure  4.5(f)  and  unique.  The 
advantage  of  distinguishing  this  operation  from  the  regular 
join  is  that,  based  on  the  "unique  and  ordered"  feature  of 
one  of  the  input  OID  attributes,  a more  efficient  algorithm 
can  be  designed  and  implemented  for  the  restricted-join. 
Output  of  this  operation  can  be  (0ID1',  0ID2',a,b)  or  any 

subset  of  it.  The  restricted- join  operation  is  not  OID 
order  preserving. 

Semi-~| oin — (SJ)  . The  semi- join  operator  implements  an 

operation  similar  to  the  semi- join  operation  that  has  been 
iri  the  relational  literature.  In  other  words,  it 
functions  the  same  as  the  regular  join  operation  except  that 
there  is  no  concatenation  of  the  inputs  and  all  the  output 
data  result  from  only  one  of  the  input  streams.  The 
semi- join  operator  is  shown  in  Figure  4.5(g).  It  has  two 
input  pipes:  one  containing  data  in  the  form  of  (OID,v)  and 
the  other  containing  a pipeline  of  objects  (0ID2)  or  value 
(v).  The  join  condition  J can  be  defined  over  the  OIDs  or 

v's.  In  the  operation,  the  input  (0ID2)  or  (v)  is  used  to 


select  data  from  another  input  { (OIDl,v) } based  on  the  join 
condition.  The  output  of  a semi-join  operation  is  (OIDl',v) 
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or  any  projection  of  it.  The  semi-join  operator  is  not  OID 
order  preserving. 

Restricted-Semi- Join (RSJ) . The  restricted-semi- join 

operator  is  to  the  semi- join  as  the  restricted- join  is  to 
the  regular  join.  in  other  words,  the  restricted-semi- join 
functions  the  same  as  a semi- join  operation  except  the 
following  restrictions.  First,  the  join  condition  is 
defined  over  the  OIDs.  Secondly,  one  of  these  two  OIDs  to 
be  joined  must  be  ordered  (0ID1)  and  unique.  Finally,  an 
optional  restriction  is  that  the  output  has  the  same  OID 
order  as  0ID2.  With  this  restriction,  the 
restricted-semi- j oin  operation  is  OID  order  preserving  with 
respect  to  0ID2 . This  optional  restriction  is  applied  in 
the  result-processing  phase  of  a two-phase  PQPS  to  be 
described  in  Chapter  5. 

Set  Operators  (SI,SU,SD).  There  are  three  types  of 
set  processors,  corresponding  to  the  following  three  set 
operations:  set  intersection  (SI),  set  union  (SU) , and  set 
difference  (SD) . As  shown  in  Figure  4.5(i),  the  inputs  are 
objects  (OIDs)  or  data  (v's).  The  output  is  the  result  of 
the  set  operation,  also  in  the  form  of  objects  or  data. 
This  operator  is  not  OID  order  preserving. 

Constructor  (CN).  The  constructor  is  used  to  populate 
a skeletal  generalized  relation  (G-relation)  with  data.  The 
inputs  of  the  constructor  are  any  number  of  input  pipes 
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containing  data  to  be  used  for  the  construction,  as  shown  in 
Figure  4.5 (j).  The  data  in  these  input  pipes  contain  only 
values  but  no  OIDs.  The  construction  of  the  G-relation  is 
controlled  by  a skeletal  G—  relation  [G] , containing  a bit 
map  of  the  resultant  G—relation.  A "l"  in  the  controlling 
G_relation  corresponds  to  a value  in  the  output  G-relation 
and  a 0"  corresponds  to  an  null  value  in  the  output 
G-relation.  The  output  of  the  constructor  is  a fully 
populated  G-relation.  This  operator  is  essential  in  the 
result-processing  phase  of  the  two-phase  PQPS.  More  details 
about  the  skeletal  G-relation  and  output  G-relation  will  be 
given  in  Chapter  5. 
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Figure  4.1.  The  system  architecture  of  the  OFC . 


External  Users 
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Figure  4.2. 


The  architecture  of  a processing  node  (PN) . 
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(a)  A G-relation  R consists  of  object  identifier  (OID),  descriptive 
attributes  (D attr),  and  association  attributes  (A_attr). 
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(b)  Vertical  fragmentation  of  the  G-relation. 


Figure  4.3.  Storage  structure  for  data  in  the  OFC 
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D_R:  Descriptive  attributes  of  the  relation  R. 
A_R:  Association  attributes  of  the  relation  R. 


Figure  4.4.  G-relations  for  some  of  the  E-classes 


in  Figure  3.1. 


(d)  Merge-Sorter  (MSR) 
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Figure  4 . 5 (continued)  . 


CHAPTER  5 

QUERY  PROCESSING  IN  THE  OFC 

Traditionally,  a database  query  can  be  processed  by 
decomposing  the  query  into  a set  of  primitive  operations  and 
performing  the  operations  in  a parallel/pipelined  manner. 
We  call  this  processing  strategy  a conventional  parallel 
query  processing  strategy  (PQPS) . In  this  section,  a novel 
two-phase  PQPS  within  the  OFC  is  presented  and  an  example  is 
given  for  illustration.  We  begin  the  illustration  by 
describing  the  processing  of  the  example  query  using  a 
conventional  PQPS  in  Section  5.1  and  use  it  as  a basis  for 
contrast  to  the  two-phase  PQPS  described  in  Section  5.2. 
The  query  used  for  illustration  is  as  follows: 

Query  1 

For  each  faculty  member  who  is  in  the  college  of 
engineering  and  is  advising  graduate  students  who  are  taking 
courses  with  section#  >=  9000,  retrieve  the  faculty  member's 

name  and  specialty.  Also  retrieve  the  students'  names  and 
the  course  titles. 

5. 1 The  Traditional  POPS 

Figure  5.1  illustrates  the  use  of  the  relational  data 
model  and  a conventional  query  tree  to  process  the  query. 


42 


43 


Figure  5.1(a)  shows  a relational  representation  of  the 
educational  database  shown  in  Figure  3.1.  The  decomposed 
operations  and  general  flow  of  the  execution  of  these 
operations  are  specified  in  the  query  tree  shown  in  Figure 
5.1(b).  Note  that  this  query  tree  is  optimized  so  that  all 
projections  and  selections  are  at  the  bottom  of  the  query 
tree  and  that  only  the  necessary  attributes  required  to 
process  the  query  (i.e.,  those  required  in  some  selection  or 
join  conditions  or  in  the  output  list)  are  carried  up  the 
query  tree  during  execution.  Also,  the  number  of  join 
operations  are  minimized.  In  spite  of  the  optimization,  the 
number  of  join  operations  in  the  query  tree  is  substantial, 
due  to  the  nature  of  the  relational  data  model 
(normalization) . Moreover,  a large  amount  of  data  is 
unnecessarily  carried  along  and  transmitted  and 
retransmitted  among  processors.  For  example,  if  the  join 
operators  near  the  bottom  of  the  query  tree  have  high  join 
selectivity  factors  and  a join  operator  near  the  top  of  the 
query  tree  has  a low  factor,  then  a large  amount  of  data 
(e.g.,  attribute  values  that  are  simply  for  output  purpose 
such  as  Fac. specialty)  will  be  carried  along  unnecessarily 
and  be  eliminated  by  the  top  join  operator.  The  problem  is 
caused  by  the  fact  that  in  the  conventional  PQPS , the  two 
distinct  phases  of  query  processing  are  not  recognized  and 
output  data  are  carried  along  during  the  identification 
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(join)  phase.  The  transfer  of  the  unnecessary  data  from 
secondary  storage  to  processor  and  from  processor  to 
processor  is  very  costly,  especially  if  complex  objects  with 
a large  amount  of  data  are  processed. 

5 . 2 The  Two-Phase  POPS 

In  the  OFC,  query  processing  is  recognized  as  a 
composition  of  two  distinct  phases:  identifying  the 

data  from  a database  and  processing  the  qualified 
data  to  produce  the  final  result.  The  advantage  of 
distinguishing  these  two  phases  during  the  processing  is 
mainly  to  avoid  a considerable  amount  of  unnecessary  data 
accesses  and  transfers  from  secondary  storages  and  among 
processors.  To  illustrate  the  two-phase  PQPS  of  the  OFC, 
the  database  shown  in  Figure  3.1  is  used.  The  G-relations 
of  involved  E-classes  are  shown  in  Figure  4.4.  Recall  that 
descriptive  attributes  describe  the  properties  of  an  object, 
and  association  attributes  contain  information  concerning 
the  association  of  the  object  with  the  other  objects. 

The  two-phase  pqps  of  the  OFC  consists  of  an 
identification  phase  and  a result-processing  phase.  As  an 
illustration,  the  identification  phase  and  the 
result-processing  phase  for  processing  the  example  query  are 
shown  in  Figure  5.2(a)  and  Figure  5.2(b),  respectively,  in 
the  form  of  an  object  flow  tree  (OFT).  Like  a conventional 
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query  tree,  an  OFT  specifies  the  operations  and  the  general 
flow  of  the  execution  of  these  operations.  The  difference 
is  that  an  OFT  is  defined  with  respect  to  the  architecture 
of  the  OFC ; i.e.,  with  references  to  the  fragmented 
G-relations  and  the  object  flow  operators.  The  circles  in 
the  object  flow  tree  represent  object  flow  operators.  As 
before,  the  light  arrows  represent  controlling  pipes  and  the 
dark  arrows  represent  pipes  to  be  controlled.  The  direction 
of  the  arrow  specifies  the  direction  of  the  flow.  The 
braces,  { },  represent  a set  or  block  of  objects  or  data. 

The  processing  of  the  identification  phase  begins  at 
the  bottom  of  the  object  flow  tree.  The  selection  operators 
(SL)  are  used  to  select  objects  from  the  descriptive 
G-relations.  The  selected  objects,  in  the  form  of  OIDs,  are 
then  piped  from  one  SFU  to  another  (RSJ,  SJ,  JN,  etc.)  and 
are  used  to  select  and  relate  other  objects  (using 
association  G-relations).  During  this  process,  a skeletal 
G-relation  populated  with  only  OIDs  is  gradually  formed.  In 
this  example,  the  final  skeletal  G-relation  at  the  root  node 
contains  three  OID  columns  { (Fac_OID,  Stud_OID,  CouOID) } . 
Each  OID  in  this  skeletal  G-relation  represents  an  object 
that  qualifies  for  the  final  result.  The  instances  of  the 
skeletal  G-relation  are  piped  out  of  the  JN  operator  and 
serve  as  the  input  to  the  result-processing  phase. 
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In  the  case  of  a retrieval  query  such  as  in  this 
example,  the  task  of  result-processing  phase  is  to  collect 
the  descriptive  data  associated  with  the  objects  that  have 
been  identified  in  the  first  phase.  As  shown  in  Figure 
5.2(b),  data  are  collected  from  fragmented  descriptive 
G-relations  using  the  RSJ  operators.  The  RSJ  operators  in 
this  phase  will  enforce  the  optional  restriction  mentioned 
in  Section  4.3  so  that  output  data  preserve  the  same  order 
with  respect  to  their  input  OIDs  from  the  first  phase. 
Finally,  based  on  the  skeletal  G-relation,  the  CN  operator 
uses  the  collected  data  to  construct  the  final  result. 

As  we  can  see  from  Figure  5.2,  the  first 
(identification)  phase  is  quite  similar  to  a conventional 
query  tree  except  that  operators  and  data  to  be  processed 
are  different.  The  efficiency  of  the  first  phase  is  mainly 
due  to  the  fact  that  OIDs  are  transferred  rather  than  entire 
tuples.  As  a result,  a large  amount  of  data,  which  may  be 
unnecessary  at  the  end,  are  not  carried  along.  The  second 
(result-processing)  phase  is  an  additional  phase  as  compared 
to  the  traditional  PQPSs.  In  this  phase,  only  RSJ  and  CN 
operators  are  necessary.  The  required  number  of  RSJ 
operators  is  dependent  on  the  number  of  E-classes  that 
contain  an  output  attribute.  All  RSJ  operators  perform  data 
collection  operations  in  parallel. 
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Note  that,  for  simple  retrieval  queries,  the  two  phases 
can  be  performed  in  one  step  and  the  OFC  PQPS  is  reduced  to 
a conventional  one.  This  can  be  illustrated  in  the 
following  example. 

Query  2 

Get  the  titles  of  all  graduate  courses  (i.e.,  c#  >= 
5000) . 

To  process  this  query,  only  a selection  is  necessary. 
The  selection  is  performed  on  the  descriptive  attributes  of 
COURSE,  the  D_C0URSE . Once  a qualified  course  is 
"identified",  the  course  title  is  "collected"  for  output. 
In  other  words,  the  identification  phase  is  also  the 
result-processing  phase. 

Finally,  we  like  to  point  out  that  the  two-phase  PQPS 
is  not  only  suitable  for  retrieval  operations,  but  also 
applicable  to  other  system-defined  operations  (e.g., 
insertion,  deletion,  and  update)  and  user-defined 
operations.  Query  3 is  an  example  of  a query  using  a user- 
defined  operation. 

Query  3 

For  each  graduate  student  who  does  not  have  an  advisor, 
perform  the  user-defined  operation,  AssignAdvisor . 
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To  process  this  query  using  the  two-phase  PQPS,  we 
first  identify  the  appropriate  graduate  students  in  the 
first  phase  by  performing  a set  difference  operation  on  GRAD 
and  ADVISING  E-classes  over  Grad_OID  (i.e.,  GRAD. Grad_0 ID- 
ADVISING.  Grad_OID)  . This  set  difference  operation  is 
performed  by  using  the  ST  operator  defined  in  Section  4.3. 
Then  in  the  second  phase,  the  selected  objects  (in  the  form 
of  OIDs)  are  piped  to  the  PN  which  contains  the  method 
corresponding  to  the  user-defined  operation,  AssignAdvisor . 
Presumably,  this  method  is  stored  in  the  PN  which  contains 
the  descriptive  G-relation  of  GRAD.  In  general,  the  methods 
which  implement  the  user-defined  operations  for  the  objects 
and  classes  are  distributed  among  the  data  throughout  the 
OFC  • Thus,  in  addition  to  the  distributed  and  parallel 
processing  of  data,  application  code  in  the  form  of  methods 
can  also  be  distributed  among  the  data  and  executed  in  a 
distributed  and  parallel  fashion. 

5-3  Features  of  the  Two-Phase  POPS  and  the  ofc 

The  key  features  of  the  two-phase  PQPS  and  the  Object 
Flow  Computer  are  summarized  as  follows: 

Increase  horizontal  concurrency.  The  high-level  query 
is  decomposed  into  an  object  flow  tree  of  primitive 
operators.  Operators  at  the  same  object  flow  tree  level  can 
be  executed  in  parallel,  thus  increasing  the  horizontal 
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concurrency  of  the  system.  For  example,  the  SL  operators 
for  G-relations  D_DEPARTMENT  and  D_SECTION  in  the  processing 
phase  can  operate  in  parallel  and  all  the  RSJ  operators  in 
the  result-processing  phase  can  operate  in  parallel. 

Moreover,  since  files  are  vertically  fragmented  in  the 
system,  horizontal  concurrency  can  be  further  increased. 
Note  that  the  association  G-relations  are  generally  used  in 
the  identification  phase  (except  for  the  initial  selection) 
and  descriptive  G-relations  are  used  in  the 
result-processing  phase.  As  a result,  while  the  association 
part  of  a G-relation  (e.g.,  A_FACULTY)  is  being  processed  in 
the  first  phase,  the  descriptive  part  (e.g.,  D_FACULTY)  can 
be  used  in  the  second  phase. 

Increase  vertical  concurrency.  Pipelining  and  object 
flow  execution  allow  the  operators  at  different  levels  of 
the  object  flow  tree  to  be  executed  concurrently,  thus 
increasing  the  vertical  concurrency  of  the  system.  For 
example,  the  operators  from  the  identification  phase,  SL, 
RSJ,  SJ,  JN,  and  the  operators  from  the  result-processing 
phase,  RSJ  and  CN,  form  a 6-stage  pipeline.  These  operators 
are  all  operating  concurrently,  each  processing  the  block  of 
objects  or  data  that  was  transferred  to  it  from  the  previous 
operator (s)  in  the  pipe. 

Minimize — the — join — operator . The  join  operation  is 
recognized  to  be  the  bottleneck  operations  in  conventional 
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systems.  Much  work  has  been  devoted  to  improve  its 
performance.  In  the  OFC,  our  proposed  solution  is  simply  to 
replace  the  join  operation,  when  possible,  with  much  more 
efficient  special  join  operations  (RJ,  SJ,  and  RSJ) . This 
is  possible  in  the  OFC  because  associations  among  objects 
are  explicitly  specified  in  the  object-oriented  semantic 
data  model  and  the  "join"  of  these  objects  are  generally 
through  their  OID's.  As  illustrated  in  Query  1,  most  of  the 
join  operations  in  the  relational  model  implementation  shown 
in  Figure  5.1(b)  are  replaced  by  RSJ  and  SJ  operations. 

Minimize — unnecessary  data  transfer.  In  a conventional 
data  flow  system,  all  attribute  values  of  a relation  are 
transferred  into  the  main  memory  from  the  secondary  storage 
whether  or  not  they  are  relevant  to  a query.  This  is 
because  relations  are  not  vertically  fragmented.  Also,  by 
not  distinguishing  the  identification  phase  from  the 
result-processing  phase  of  query  processing,  all  attributes 
that  are  included  in  the  output  project  list  and/or 
participate  in  any  qualification  condition  in  any  part  of 
the  query  tree  are  carried  along  and  transmitted  (and 
retransmitted)  among  the  processors,  even  though  they  may 

eventually  be  eliminated  by  an  operator  far  up  in  the  query 
tree. 

In  the  OFC,  objects  in  the  form  of  OIDs  flow  through 
the  system  and  are  used  to  select  and  relate  objects  during 
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the  first  phase.  Since  an  OID  requires  far  fewer  bytes  to 
represent  than  an  entire  "tuple"  in  general  (a  4-byte  OID 
represents  up  to  232=  4G  tuples)  , each  block  can  hold  much 
more  OIDs  than  tuples.  The  information  contents  of  the 
blocks  transferred  among  processors  are  much  denser. 
Transfer  of  data  is  postponed  until  the  second  phase  when 
the  relevant  objects  have  been  identified.  Only  the 
required  data  is  transferred.  The  performance  improvement 
of  this  strategy  becomes  more  dramatic  when  complex  objects 
are  processed  since  they  are  defined  in  terms  of  many  other 
objects,  resulting  in  a large  amount  of  data  associated  with 
them.  By  using  their  OIDs  in  the  object  identification 
phase,  the  unnecessary  accesses  and  transfers  of  their 
descriptive  data  can  be  avoided.  Also,  since  a large  amount 
of  (sometimes  unnecessary)  data  is  not  repeatedly 
transmitted  through  the  network  of  pipelines,  the 
reliability  of  the  system  is  increased  along  with  the 
efficiency. 

Suitable — for — complex  queries.  As  one  can  see  from  the 
previous  examples  (and  will  be  proven  in  the  next  chapter) 
that  the  OFC  two-phase  PQPS  is  especially  efficient  for 
processing  complex  queries  and  in  which  a large  amount  of 
data  need  to  be  processed.  Note  that  this  is  desirable 
since  there  is  no  reason  to  design  special  architecture  and 
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processing  strategies  to  improve  the  performance  of  simple 
queries  which  may  take  only  milliseconds  to  process. 

Distribution — of — the — application  code.  In  general,  the 
methods  which  implement  the  user-defined  operations  for  the 
objects  and  classes  are  distributed  among  the  data  through 
the  OFC . Thus,  in  addition  to  the  distributed  and  parallel 
processing  of  data,  application  code  in  the  form  of  methods 
can  also  be  distributed  among  the  data  and  executed  in  a 
distributed  and  parallel  fashion.  This  is  seen  to  be  an 
important  next  step  in  the  database  computer  support  of 
database  management,  to  eliminate  the  artificial  barrier 
between  the  application  system  and  the  database  system. 
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(a)  The  identification  phase  of  the  OFC. 

Figure  5.2.  Two-phase  PQPS  using  the  OSAM* . 
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Final  result 


Figure  5 . 2 (continued)  . 


CHAPTER  6 

COST  ANALYSIS  ON  THE  PROCESSING  STRATEGY 

In  this  chapter , we  aim  to  show  that  for  processing 
complex  queries,  the  two-phase  PQPS  is  more  efficient  than 
the  conventional  PQPS.  If  the  complexity  of  a query  is 
determined  by  the  number  of  joins  contained  in  the  query, 
then  we  can  compare  the  efficiencies  of  different  strategies 
by  using  queries  of  different  number  of  joins.  In  Section 
6.1,  we  derive  the  general  cost  functions  of  query 
processing  under  two  strategies,  a conventional  PQPS  and  the 
two-phase  PQPS.  And  the  comparison  results  are  presented  in 
Section  6.2.  In  Section  6.3,  an  example  query  is  used  for 
the  performance  comparison  between  the  two-phase  PQPS  and 
the  traditional  PQPS.  The  results  coincide  with  those 
obtained  in  Section  6.2. 

6-1  The  Analytical  Model 

We  begin  with  the  assumptions  made  in  the  development 
of  the  evaluation  model.  First,  we  assume  that  data  of  an 
application  world  is  modeled  by  the  following  two  data 
models:  (A)  the  relational  data  model  and  (B)  the  OSAM*  data 

model.  We  assume  that  the  same  query  is  translated  into  a 
conventional  query  tree  and  an  object  flow  tree  in  case  A 
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and  case  B,  respectively,  as  shown  in  Figure  6.1(a)  and 
6.1(b).  For  simplicity,  we  consider  only  join  operations  in 
the  comparison.  Selections  and  projections  are  not 
considered  because  in  executing  a multiple  join  guery,  their 
costs  are  negligible.  The  number  of  joins  involved  is  n in 
case  A and  m in  case  B,  as  shown  in  Figure  6.1.  For  case  B, 
the  second  phase  of  the  object  flow  tree  also  contains  k 
RSJs  and  one  CN.  Note  that  most  of  the  "joins"  in  the 
object  flow  tree  in  case  B should  be  special  joins  (e.g.,  Rj 
and  SJ)  as  shown  in  the  examples  in  Chapter  5.  However,  we 
are  using  here  the  worst  situation  for  case  B in  which  all 
"joins"  have  the  performance  of  the  JN  operator. 

To  make  the  comparison  purely  on  processing  strategies 
in  different  models  and  eliminate  the  effect  caused  by 
hardware,  we  assume  an  ideal  machine  for  query  processing  in 
both  cases.  in  other  words,  each  of  the  query  trees  in 
Figure  6.1(a)  and  6.1(b)  is  executed  in  parallel  in  a 
multiprocessor  system,  whose  processors  are  interconnected 
in  exactly  the  same  manner  as  the  query  tree.  Therefore,  no 
bus  contention  exists  in  this  ideal  machine.  Each  node 
(operator)  of  the  query  tree  is  implemented  by  a processor 
and  hence  no  processor  contention  exists.  All  processors 
are  assumed  to  be  general-purpose  processors.  Thus,  the 
special-purpose  processors  of  the  OFC  are  not  utilized. 
Each  processor  has  a local  memory  of  the  same  size  and,  if 
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necessary,  a private  disk  (with  single  R/W  head)  for  storing 
temporary  result. 

The  join  factor  assumed  in  each  join  operation  is 
slightly  different  from  what  we  usually  see  in  the 
terature  in  evaluating  a single  join  operation.  In  the 
evaluation  of  a single  join,  the  join  factor  is  usually 
varied  within  a wide  range  (e.g.,  0.0001  to  0.9)  to 

determine  the  performance  under  different  join  factors 
[DEW84,  VAL84 ] . However,  in  evaluating  a multiple  join 
query,  large  join  factors  (0.1  can  be  large)  can  no  longer 
be  applied  if  a reasonable  result  size  is  expected.  To  give 
an  example,  let  us  assume  that  a query  contains  three  joins. 
The  base  (G-) relations  input  to  these  three  joins  contain 
oniy  1,000  tuples.  If  all  three  join  factors  are  0.1,  then 
the  result  after  performing  the  three  joins  will  have  as 
many  as  ( ( ( 1 0 0 0 * 1 0 0 0 * 0 . 1 ) * l o 0 0 * 0 . 1 ) * 1 o 0 0 * 0 . 1 ) 
1,000,000,000  tuples.  The  situation  is  even  worse  if  the 
base  relation  sizes  are  larger.  Therefore,  small  join 
factors  should  be  used  in  evaluating  multiple  join  queries. 
For  simplicity,  but  without  loss  of  generality,  in  this 
analysis  we  assume  that  in  both  case  A and  case  B the  number 
of  tuples  contained  in  all  operand  relations  are  the  same, 
and  all  join  factors  are  identical  to: 
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1/ (number  of  tuples  contained  in  an  operand  relation) 

Because  of  this  assumption,  the  intermediate  result  sizes  in 
terms  of  number  of  tuples  will  not  increase  but  remain  the 
same.  If  we  use  the  above  example,  the  join  factor  will  be 
1/1000  = 0.001.  So  the  result  will  also  contain 
(((1000*1000*0.001)  *1000*0.001) *1000*0.001)  = 1000  tuples. 

Each  of  the  base  relations  is  x byte  wide  in  case  A, 
as  shown  in  Figure  6.1,  and  is  x^  byte  wide  in  the  first 
phase  of  case  B and  x2  bytes  for  the  second  phase.  In  both 
cases,  the  width  of  a join  result  is  the  sum  of  widths  of 
two  input  relations.  Recall  that  the  xx  bytes  are 
association  data  (i.e.,  OIDs)  and  x2  bytes  are  descriptive 
data.  In  this  analysis,  each  OID  is  assumed  to  be  4 byte 
wide.  Thus,  x and  x2  are  in  general  much  larger  than  x^. 

Since  we  assume  each  relation  has  a same  number  of 
tuples,  x,  xlt  and  x2  can  also  be  viewed  as  the  relation 
sizes.  A constraint  on  these  variables  is  that  the  size  of 
the  results  obtained  under  two  cases  are  identical.  Namely, 

(n+1) x = kx2  (1) 

where  the  left  term  is  the  result  size  obtained  in  case  A 
and  the  right  term  the  result  size  of  case  B.  Another 
constraint  concerning  the  object  flow  tree  in  case  B is  that 
the  number  of  RSJ  operators  required  in  the  second  phase 
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(i.e.  , k)  should  be  greater  than  or  equal  to  the  number  of 
OID  columns  in  the  skeletal  G-relation  obtained  in  the  first 
phase  (i.e.,  k >=  (m+l)x1/4,  where  4 is  the  OID  size).  This 
is  true  because  each  OID  column  of  the  skeletal  G-relation 
is  used  to  collect  data  from  at  least  one  E-class,  as  shown 
in  Figure  5.2(b) . In  the  following  analysis,  we  simply  use 
the  equality  part  (i.e.,  the  worst  case)  of  this  constraint 
and  represent  it  as  follows: 

k = (m+l)x1/4  (2) 

In  other  words,  the  smallest  k is  used.  According  to 
equation  (1),  a small  k will  imply  a large  x2  if  the  result 
size  (n+1)  x is  a constant.  A large  x2  will  cause  a high 
cost  in  query  processing.  Therefore,  the  assumption  of 
equality  is  not  in  our  favor. 

We  assume  that  all  the  JN  processors,  in  both  Figure 
6.1(a)  and  6.1(b),  perform  a general  hash-based  join 
^l^o^ithm . The  algorithm  contains  two  phases:  hash  phase 
and  join  phase.  At  the  beginning,  relations  are  stored  in 
secondary  storages.  in  the  hash  phase,  operand  relations 
(assuming  R and  S)  of  a join  are  loaded  into  the  main 
memory,  hashed  into  a number  of  buckets,  and  then  unloaded 
to  the  secondary  storage.  In  the  join  phase,  pair  by  pair 
of  R and  S buckets  are  re— loaded  into  the  main  memory  for 
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performing  join  operation.  Result  will  be  sent  to  the  next 
node  immediately  (assuming  there  is  no  need  to  store  the 
join  result  back  to  the  secondary  storage) . The  reason  that 
we  select  a hash-based  join  algorithm  is  because  it 
outperf  orms  other  join  algorithms  (e.g.,  sort-merge  join, 
nested-block  join,  etc.)  in  most  cases  [ DEW84 ] . To  use  any 
slower  algorithm  will  only  increase,  to  our  favor,  the 
performance  difference  of  two  processing  strategies. 

For  the  second  phase  of  an  object  flow  tree,  we  assume, 
for  simplicity,  that  the  RSJ  operations  has  the  performance 
of  ordinary  join  operations.  Note  that  this  assumption  does 
not  favor  us  because  RSJ  operation  can  be  facilitated  by 
taking  advantage  of  the  sorted  order  of  one  input  data,  as 
presented  in  Section  4.3.  To  perform  a RSJ  like  a join 
based  on  hash- join  algorithm,  we  do  not  take  advantage  of 
this  feature.  Also,  we  assume  the  cost  of  performing  CN  is 
negligible  as  compared  to  the  join  operation. 

Since  each  node  in  both  query  trees  (case  A and  case  B) 
contains  only  one  processor  and  a single  disk  with  single 
R/W  head,  all  the  processes  in  a join  operation  (e.g.,  data 
loading  and  unloading)  are  performed  sequentially  inside  a 
processor.  However,  the  overall  execution  in  both  cases  is 
assumed  to  be  performed  in  parallel  as  much  as  possible. 
More  specifically,  data  are  loaded  and  processed  in  a 
pipelined  fashion  so  that  the  hash  phase  of  a join  is 
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completely  overlapped  with  the  hash  and  join  of  the 
preceding  join.  Temporary  result  transferred  from  one  node 
to  another  is  overlapped  with  the  processing  of  data  in  the 
nodes.  Finally,  we  consider  only  the  cost  of  accessing  data 
in  secondary  memory  since  the  processing  cost  is  small  in 
comparison. 

First,  we  consider  case  A,  the  conventional  PQPS  shown 
in  Figure  6.1(a).  For  one-join  query  (i.e.,  n=l) , the  total 
execution  cost  is  the  sum  of  the  cost  in  hash  phase  and  the 
cost  in  join  phase.  Since  the  relation  sizes  can  be 
represented  by  their  widths,  to  access  two  relations  of  size 
x for  hashing  (including  loading  and  unloading)  needs  4x 
accesses.  To  join  two  hashed  relations  (loading  only)  costs 
2x  accesses.  The  total  cost  is  therefore  6x. 


For  a two- join  query,  the  execution  cost  is  derived  in 
a similar  manner.  The  first  join  operation  costs  6x 
accesses  as  derived  above.  For  the  second  join,  the  two 
input  relations  are  a temporary  result  of  size  2x  and  a base 
relation  of  size  x.  Since  parallel  processing  in  this 
multi- join  case  is  assumed  (the  hash  on  both  operand 
relations  in  the  second  join  is  completely  overlapped  with 
the  hash  and  join  process  in  the  first  join) , the  hash  cost 
of  the  second  join  is  0.  The  join  cost  is  2x+x  = 3x.  Thus, 
the  total  execution  cost  for  this  case  is  6x+3x  = 9x.  For  a 
three- join  query,  the  execution  cost  is  13x.  The  derivation 
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can  be  easily  extended  to  a n-join  case.  An  iterative 
equation  shown  below  can  describe  the  execution  cost  of  the 
n-join  case. 

C_CONV(l,x)  = 6x 

C_CONV ( n , x ) = c_CONV(n-l,x)  + (n+l)x 

where  C_CONV  stands  for  the  Cost  of  using  a CONVentional 
PQPS . The  first  argument  is  the  number  of  joins  in  a query 
and  the  second  argument  is  the  size  of  a base  relation.  The 

closed  form  of  this  iterative  equation  is  obtained  as 
follows : 


C_CONV(n, x)  = (6  + (n+4) (n-l)/2)x  (3) 

For  the  two-phase  PQPS  shown  in  Figure  6.1(b),  the 
total  execution  cost  is  the  summation  of  costs  in  two 
phases.  For  the  first  phase,  the  execution  cost  is  derived 
in  the  same  manner  as  in  case  A.  Thus,  a value  of 

(6+(m+4) (m-l)/2)x1  is  obtained.  For  the  second  phase 
processing,  parallel  processing  is  also  assumed.  In  other 
words,  hashing  on  operand  G-relations  of  a RSJ  operation  in 
the  second  phase  is  performed  in  parallel  with  the  hash  and 
join  process  in  the  first  phase.  The  cost  in  the  second 
phase  is  therefore  the  cost  of  joining  two  hashed 
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G-relations  of  size  x2  and  4.  This  is  equal  to  (x2+4)  . 
Note  that  the  cost  for  data  collection  is  independent  of  k, 
the  number  of  RSJ  processors  in  the  second  phase.  This  is 
because  data  collection  processes  in  RSJ  processors  are 
performed  concurrently.  The  total  execution  cost  in  the 
object  flow  tree  is  therefore: 

C_TWOPH(m,x1,x2)  = (6  + (m+4) (m-l)/2)x1  + x2  + 4 (4) 

where  C_TWOPH  stands  for  the  Cost  of  using  the  TWO-PHase 
PQPS.  The  speedup  ratio  of  query  execution  using  the 
two-phase  PQPS  versus  the  conventional  PQPS  can  be  defined 
by  using  the  derived  cost  functions  as: 


Speedup  Ratio  = C_CONV(n,x)/C_TWOPH(m,xlf x2) 

(6  + (n+4 ) (n-l)/2)x 
(6  + (m+4) (m-l)/2)x1  + x2  + 4 


(5) 


where  x2  can  be  represented  in  terms  of  x and  x^  by 
substituting  k of  equation  (2)  into  equation  (1): 


x2  = 4*(n+l)*x/[ (m+l)*x1] 


(6) 
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6 • 2 Performance  Comparison 

The  speedup  ratio  of  the  two-phase  PQPS  versus  the 
conventional  PQPS  under  selected  n,  m,  x,  and  values  are 
given  in  Figure  6.2  and  6.3.  Figure  6.2(a),  6.2(b),  and 
6.2(c)  show,  respectively,  the  cases  of  small,  medium,  and 
large  input  operand  sizes.  The  speedup  ratio  is  observed  by 
varying  n and  m.  Figure  6.3(a),  6.3(b),  and  6.3(c)  show  the 
results  of  processing  queries  of  different  complexity.  The 
speedup  ratio  is  observed  by  varying  x and  x^. 

From  Figure  6.2,  the  following  conclusions  can  be 
drawn:  (1)  The  three  figures  show  that  the  larger  the  sizes 
of  the  operands  x and  xl  (i.e.  data  to  be  processed)  , the 
more  advantageous  the  two-phase  strategy  is  over  the 
conventional  strategy.  in  other  words,  the  speedup  ratios 
are  generally  greater  in  Figure  6.2(c)  than  those  in  Figure 
6.2(b),  which  are  greater  than  those  in  Figure  6.2(a). 

(2)  These  figures  also  show  that,  in  general,  the 
larger  is  n than  m (i.e.  larger  number  of  join  in  the 
conventional  PQPS) , the  higher  is  the  speedup  ratio.  This 
is  expected  to  be  the  normal  case  since  in  the  relational 
data  model,  data  which  should  be  treated  as  a unit 
semantically  are  often  scattered  among  many  relations  due  to 
the  normalization  process  of  the  relational  model.  However, 
where  n is  much  greater  than  m,  such  as  the  case  in  Figure 
6.2(b)  where  n=9  and  m=l,  this  trend  is  not  true.  This  is 
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because  if  n is  much  larger  than  m,  the  x2  value  obtained  in 
equation  (6)  will  be  much  larger  than  x.  For  example,  given 
x=60  and  x1=8,  then  x2  is  150  at  n=9  and  m=l,  but  only  75  at 
n— 9 and  m— 3 . A much  larger  x2  (than  x^)  will  significantly 
increase  the  C_TWOPH  cost  so  that  the  ratio  of  C CONV  to 
C_TWOPH  in  equation  (5)  will  drop. 

(3)  In  each  figure,  there  are  two  dotted  lines.  The 
one  where  the  speedup  ratio  is  equal  to  1 indicates  the 
cases  where  the  performance  of  the  two  PQPSs  is  the  same. 
The  one  crossing  over  the  curves  is  the  n=m  line,  which 
divides  the  performance  domain  into  two  parts.  The  part 
above  this  line  represents  n>m  division,  and  the  part  below 
represents  the  n<m  division.  For  the  n>m  division  (which  is 
the  expected  normal  case) , the  speedup  ratio  averages  in 
excess  of  10.  in  other  words,  the  two-phase  PQPS  can 
provide  normally  more  than  an  order  of  magnitude  of 
improvement  in  contrast  to  the  conventional  PQPS . It  is 
interesting  to  note  that  even  for  the  n=m  case  (numbers  of 
joins  in  case  A and  in  case  B are  equal) , the  two-phase  PQPS 
is  about  4 to  8 times  more  efficient  than  a conventional 
PQPS.  This  implies  that  if  the  two-phase  PQPS  is  applied  to 
a relational  database,  it  is  still  much  more  efficient.  For 
the  n<m  case  (which  is  an  unlikely  case) , the  two-phase  PQPS 
still  performs  better. 
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Figure  6.3  shows  the  results  by  varying  x and  x ^ but 
fixing  n and  m at  certain  values.  The  three  figures, 
6.3(a),  6.3(b),  and  6.3(c),  represent  the  result  of 

processing  simple,  medium,  and  complex  queries, 
respectively.  The  reason  that  we  consider  queries  with  a 
large  number  of  joins  (n=5  and  m=3  as  a medium  query  and  n=9 
and  m=5  as  a complex  query)  is  that  the  object-oriented 
semantic  association  model,  OSAM*,  is  designed  to  model 
structurally  and  behaviorally  complex  data  in  applications 
such  as  computer  integrated  engineering  and  manufacturing 
(CIEM)  [SU89].  In  such  applications,  large  number  of  object 
classes  (relations)  are  related  to  each  other  through 
various  associations  (relationships) . Queries  in  such  an 
environment  typically  involve  a large  number  of  joins  over 
object  classes  in  order  to  obtain  the  result. 

The  results  from  Figure  6.3  re-emphasizes  that  the  more 
joins  (n  and  m values)  is  required  in  a query,  the  more 
advantageous  is  the  two-phase  strategy  than  the  conventional 
strategy.  Also,  the  larger  the  amount  of  data  to  be 
processed  (size  of  x)  , the  more  advantageous  is  the 
two-phase  strategy. 

In  summary,  it  is  shown  that  the  two-phase  PQPS 
processing  the  OSAM*  model  can  provide  at  least  an  order  of 
magnitude  of  improvement  over  the  conventional  PQPS 
processing  the  relational  model  in  cases  where:  1)  a large 
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amount  of  data  needs  to  be  processed  and  2)  queries  to  be 
processed  are  complex  (i.e.,  large  number  of  "joins" ) in 
which  n>m.  Note  that  the  speedup  is  due  solely  to  the 
two-phase  PQPS  and  the  OSAM*  model.  No  other  features  of 
the  OFC  (such  as  special  OFC  operators  or  special  function 
hardware)  are  used  in  this  evaluation. 

6.3.  A Case  Study 

In  this  section,  we  study  a specific  case  and  compare 
the  performance  of  the  two-phase  PQPS  of  the  OFC  with  a 
conventional  PQPS.  For  this  comparison,  the  example  Query  1 
in  Chapter  5 and  Figure  5.1  and  Figure  5.2  are  used.  Some 
factors  that  are  ignored  in  the  previous  comparison,  such  as 
the  times  for  comparing,  hashing,  accessing  data  and  the 
time  for  performing  selection  operation,  are  considered  in 
this  comparison.  The  result  is  hence  more  accurate,  but 
less  general  since  it  is  based  on  a single  query.  In  the 
following,  we  first  present  the  model  used  in  the  analysis 
and  the  parameter  settings.  Then,  the  results  are  presented 
with  discussion. 

— ■ 3 • 1 — The  Analytical  Model  and  Parameter  Settings 

Same  as  before,  we  still  assume  ideal  machines  are  used 
in  both  processing  strategies.  In  other  words,  we  assume 
that  the  query  tree  in  Figure  5.1  and  the  object  flow  tree 
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in  Figure  5.2  are  executed  in  a multiprocessor  system,  whose 
processors  are  interconnected  in  exactly  the  same  manner  as 
the  trees.  Therefore,  no  bus  contention  exists  in  these 
ideal  machines.  Each  node  (operator)  of  the  trees  is 


implemented  by  a processor  in  the  corresponding  position  of 


an  ideal  machine.  For  fairness  in  comparison,  all 
processors  are  assumed  to  be  general-purpose  processors. 
Each  processor  has  a local  memory  of  the  same  size  and,  if 
necessary,  a private  disk  for  storing  intermediate  result. 
We  also  assume  that  processing  is  in  a pipelined  fashion  in 
both  cases.  Local  I/O  time  of  a processor  is  overlapped 
with  its  processing  time.  These  times  can  also  be 
overlapped  with  the  time  for  communication  with  its 
predecessor (s)  and  successor (s) . Note  that  the  processing 
time  and  communication  time  within  the  pipelined  execution 


is  not  modeled  in  the  previous  evaluation  model. 

For  join  operations,  we  also  assume  all  ordinary  and 
special  joins  are  performed  using  a hybrid-hash  based  join 
algorithm  presented  in  [DEW84 ] . All  base  (G-) relations  are 
assumed  to  have  the  same  number  of  tuples.  The  number  of 
attributes  in  a (G-) relation  are  shown  in  Figures  3.1  and 
4.4.  Each  attribute  contains  the  same  number  of  bytes.  All 
selection  operations  have  an  identical  selectivity  factor. 
All  join  operations  also  have  an  identical  join  factor.  The 
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parameters  and  their  values  used  in  the  analysis  are  listed 
in  Table  6.1. 

The  evaluation  of  query  execution  in  a pipelined 
processing  environment  has  been  published  in  some  previous 
works  [MIK88a,  MIK88b] . For  convenience,  we  repeat  here 
some  relevant  cost  functions  for  the  join  operation  that 
conceptually  illustrate  the  pipeline  effect  in  execution. 
In  these  equations,  we  assume  i and  j are  the  two  inputs  to 
a join  processor  k. 


Nok  = size  of  output  data  from  processor  k 
= (number  of  input  i tuples) 

* (number  of  input  j tuples) *join_f actor 
Tek  = time  at  the  completion  of  join  in  k 
= t_hash_stage  + t_probe  stage 

Trk  = time  at  which  the  first  data  block  is  output 
from  k 


= max(Tri, 


Trj) 


Tek  - max ( Tr i , Trj ) 



Nok 


where 


t_hash_stage 

= max  [input  i response  time, 
input  j response  time 

+ max (total  time  to  receive  all  input  j 
data  blocks, 

time  to  build  hash  table  for 
input  j ) ] 
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t_probe_stage 

= max  [input  i response  time 

+ total  time  to  receive  all  input  i 
data  blocks 
- t_hash_stage , 

total  time  to  hash  input  i data  (including 
load  and  unload)  and  probe  j hash 
table 

+ a * t_idle, 

communication  time  to  transmit  Nok  output 
data 

+ a * t_idle] 

t_idle  = input  i response  time  — t hash  stage 
a = 1 if  t_idle  >0;  a = 0 if  otherwise. 


Note  that  in  t_hash_stage  and  t_probe_stage,  we  have  assumed 
the  input  j data  arrives  at  processor  k first  (i.e.,  faster 
response  time)  and  the  hash  table  is  built  on  input  j data. 
In  Tek,  the  costs  of  hash_stage  and  probe_stage  are  summed 
together  because  these  two  stages  are  executed  sequentially. 
Inside  each  stage,  there  are  parallel  executions.  Only  the 
maximal  cost  in  those  parallel  steps  is  included  in  the 
total  execution  cost.  The  detailed  costs  of  hash  and  probe 
stages  can  be  found  in  the  paper  [DEW84  ] . These  three 
equations  for  Nok,  Tek,  and  Trk  can  be  iteratively  applied 
from  the  leaves  of  a query  tree  to  the  root  to  obtain  the 
total  query  processing  cost. 

6.3.2  Results  and  Discussion 


Based  on  this  more  accurate  evaluation  model,  the  total 
execution  times  for  both  processing  strategies  are 
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determined  by  varying  the  relation  size,  attribute  size, 
main  memory  size,  page  size,  join  factor,  and  selectivity 
factor . The  parameters  and  their  values  used  in  the 
analysis  are  listed  in  Table  6.1.  The  result  is  shown  in 
Figure  6.4.  For  clarity,  we  also  provide  the  exact  values 
of  selected  points  for  each  curve. 

In  Figure  6.4(a),  the  advantage  of  the  two-phase 
processing  strategy  over  the  conventional  strategy  becomes 
evident  as  the  relation  size  increases.  When  relation  size 
reaches  100,000  tuples,  two-phase  processing  is  about  15 
times  faster  than  conventional  strategy.  On  the  other  hand, 
when  the  relation  size  is  small,  these  two  strategies  are 
comparable.  The  reason  is  that,  in  this  case,  the  number  of 
disk  I/O's  required  for  processing  a query  is  the  main 
determining  factor.  in  other  words,  when  the  relation  size 
is  relatively  small  in  comparison  with  the  size  of  main 
memory,  the  cost  of  loading  and  transferring  the  small 
amount  of  unnecessary  data  during  the  object  identification 
phase  in  the  conventional  strategy  has  less  of  an  impact  on 
the  performance. 

Figure  6.4(b)  shows  the  comparison  under  the  variation 
of  attribute  size.  The  advantage  of  the  two-phase  processing 
strategy  over  the  conventional  strategy  again  widens  as  the 
attribute  si-ze  increases.  This  is  to  be  expected  because  as 
the  attribute  size  increases,  the  amount  of  the 
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"unnecessary"  data  carried  along  in  the  object 
identification  phase  of  the  conventional  strategy  increases 
accordingly. 

that  variation  of  attribute  size  is  equivalent  to 
the  variation  of  the  number  of  attributes  in  a relation. 
Therefore,  Figure  6.4(b)  can  also  be  used  to  represent  the 
result  of  varying  the  number  of  attributes  of  a relation. 
Furthermore,  Figure  6.4(b)  can  represent  the  result  of 
varying  the  "complexity"  of  complex  objects  in  a database. 
In  both  cases,  the  advantage  of  the  two-phase  processing 
strategy  increases  significantly  as  the  number  of  attributes 
and/or  the  complexity  of  objects  increase. 

Figures  6.4(c)  and  6.4(d)  show  that  the  total  execution 
time  decreases  as  the  main  memory  size  or  page  size 
increases.  This  is  expected  because  in  both  cases  the 
number  of  disk  accesses  decreases.  However,  at  each  point, 
the  performance  of  the  two-phase  strategy  is  at  least  an 
order  of  magnitude  better  than  the  conventional  strategy. 

We  also  evaluate  the  effects  of  selectivity  factor  and 
join  factor  on  performance  within  a range  of  selectivity 
factor  (0.01  - 0.15)  and  join  factor  (0.00001  - 0.0001). 
The  result  shows  that  the  two-phase  strategy  still 
significantly  outperforms  the  conventional  strategy.  (Note 
that  these  two  factors  cannot  be  too  large  if  a reasonable 
size  of  output  is  expected.) 
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In  conclusion,  the  above  evaluation  results  are 
consistent  with  the  results  obtained  in  the  previous  section 
that  the  two-phase  PQPS  does  provide  at  least  an  order  of 
magnitude  improvement  over  a conventional  PQPS  under  most  of 
the  conditions  (see  the  numbers  provided  in  Figure  6.4).  It 
also  underscores  a long-recognized  fact  that  the  I/O  problem 
is  one  of  the  main  bottlenecks  in  processing  a very  large 
database.  Due  to  the  fact  that  the  two-phase  processing 
strategy  is  designed  to  minimize  unnecessary  data  accesses 
and  transfers,  it  greatly  improves  the  system  performance 
and  ease,  if  not  alleviate,  the  I/O  bottleneck  problem. 
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Table  6.1.  Parameters  and  their  values  used  in  the 
evaluation. 

System  Parameters : 

word  - Word  size  4 bytes. 

main  - Main  memory  size  is  1 Mbytes. 

page  - Page  size  is  4 Kbytes. 

macc  - Time  to  access  one  word  is  200  ns. 

comp  - Time  to  compare  two  4 -byte  values  is  800  ns. 

hash  - Time  to  hash  a 4-byte  value  is  500  ns. 

comm  - Communication  bandwidth  is  10  Mbytes/sec. 

di-sk_I0_ran  ~ Operation  time  for  a random  disk  10  is  2 0 
ms . 

c^s^_^0_se<3  “ Operation  time  for  a sequential  disk  10  is  5 
ms . 


Workload  Parameters: 


°ID_size  - Size  of  an  object  ID  is  4 bytes. 
attr_fize  “ Size  of  an  attribute  is  20  bytes. 
rel_size  - Size  of  a relation  is  100,000  tuples. 
attr_per_tuple  - The  number  of  attributes  of  a tuple  is 
determined  by  the  query  in  the  example. 
sel_fac  - Selectivity  factor  is  0.1. 

jn_ fac  “ Join  (including  special  joins)  factor  is  0.0001. 
F - Fudge  factor  is  1.2. 


N°te;  The  condition  /|R|  *F  < | M | (see  [DEW84  ] ) must  be 
satisfied  for  hybrid-hash  join,  where  |R|  is  the  size  of  the 
relation  R to  be  hashed  and  |M|  is  the  size  of  main  memory. 
F is  a fudge  factor  which  accounts  for  the  different  types 
of  increments,  say,  in  hashing  and  sorting.  For  example,  a 
hash  table  containing  R will  require  a main  memory  size  of 

ISclS'C  | R I *F . 
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(a)  A conventional  query  tree  (b)  An  object-flow  tree 

Figure  6.1.  General  query  tree  representations  in 

conventional  systems  and  in  the  OFC  (not 
considering  selections  and  projections)  . 


(n>m) 
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(a)  Total  execution  time  vs.  relation  size 


Execution  Time  (sec) 


Attribute  Size 
(bytes) 


(b)  Total  execution  time  vs.  attribute  size 


Figure  6.4. 


Performance  comparisons  of  two  processing 
strategies . 


Execution  Time  (sec) 
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(c)  Total  execution  time  vs.  main  memory  size 


Execution  Time  (sec) 


Figure  6 . 4 (continued)  . 


CHAPTER  7 

DESIGN  AND  PERFORMANCE  EVALUATION 
OF  THE  OFC  SYSTEM  ARCHITECTURE 

7 • 1 Design  Considerations  for  the  OFC 

The  use  of  multiple  processors  to  implement  parallel 
algorithms  has  been  a trend  in  various  applications  to 
upgrade  system  performance.  In  the  database  machine  arena, 
multiple  processors  have  been  used  to  increase  parallelism 
at  three  different  levels:  operation  level,  query  level, 

and  system  level.  in  the  operation  level,  processors  are 
used  to  support  parallelism  within  a single  database 
operation  such  as  select,  sort,  join,  and  aggregation 
operations.  Query  level  parallelism  is  achieved  when 
operations  within  one  query  are  performed  in  parallel.  By 
system  level  parallelism,  we  mean  the  concurrent  execution 
of  multiple  queries  issued  to  the  system.  In  the  past,  most 
of  the  research  are  concentrated  on  the  first  two  levels 
(operation  level  and  query  level)  of  parallelism.  In  our 
design,  the  system  level  parallelism  is  also  considered. 
Thus,  the  evaluation  of  architecture  described  in  this  and 
the  next  chapter  will  consider  all  three  levels  of 
parallelism. 
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In  terms  of  the  multiprocessor  architecture,  the  two 
most  commonly  studied  architectures  are  Share  Everything 
(SE)  and  Share  Nothing  (SN) , as  shown  in  Figure  7.1  (a)  and 
(b)  respectively.  Both  of  these  two  architectures  have 
advantages  and  disadvantages,  and  the  choice  of 
architecture  depends  on  the  application.  For  example, 
processors  within  an  SE  system  can  communicate  with  each 
other  by  simply  writing/reading  messages  (data)  to/from  the 
shared  storage  unit  (SSU) . However,  the  contention  caused 
by  simultaneous  accesses  to  the  SSU  can  significantly 
degrade  the  system's  performance.  On  the  other  hand, 
although  processors  in  an  SN  system  can  access  their  local 
data  without  interference,  the  penalty  is  the  high  cost  for 
remote  accesses  to  the  data  in  other  processors. 

For  the  application  of  supporting  a high-level  object- 
oriented  semantic  data  model,  the  SN  architecture  is  the 
architecture  of  choice  for  the  following  reasons.  First,  a 
semantic  data  model  is  rich  in  semantics  and  contains  a 
number  of  general  modeling  constructs  for  explicitly 
describing  the  semantic  properties  of  database.  Semantically 
related  object  classes  in  this  model  are  linked  to  each 
other  through  semantic  associations.  Thus,  for  a 
semantically  meaningful  query,  the  object  classes  and  their 
relationships  referenced  by  the  query  form  a graph  which  in 
general  is  a subgraph  of  the  schema  graph.  If  the 
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interconnection  network  (ICN)  of  the  SN  architecture  can 
closely  reflect  the  semantic  database  schema  so  that  direct 
physical  communication  channels  are  provided  for  logically 
related  object  classes,  the  drawback  of  high  communication 
cost  in  SN  architecture  can  be  minimized  while  processing 
such  queries.  Secondly,  data  and  application  codes  in  such  a 
database  can  be  distributed  among  object  classes  and  the 
associated  processing  nodes  based  on  the  semantics.  Hence, 
operations  of  a query  that  needs  to  access  data  in 
different  classes  are  naturally  distributed  among  the 
relevant  processing  nodes.  As  a result,  access  contention 
during  the  query  processing  can  be  minimized  and  the 
intra-query  parallelism  can  be  maximized.  Furthermore,  for 
the  same  reason,  control  on  query  processing  can  be 
distributed.  In  this  manner,  a truly  distributed  concurrent 
processing  of  multiple  queries  is  realizable.  On  the  other 
hand,  if  data  files  are  stored  separately  from  the 
processors  as  in  SE  architecture,  a centralized  controller 
is  necessary  for  managing  loading/unloading  data  fragments 
for  processing  [DEW79 , KIT84,  KIT85] . Finally,  an  SN 
architecture  has  the  advantage  of  scalability  in  terms  of 
adding  new  P-M  subsystems  into  the  architecture.  Addition 
of  new  nodes  to  a SE  system  is  more  complex  because  of  its 
centralized  control  and  the  problem  of  system  saturation. 
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7 . 2 General  OFC  Architecture 

The  general  architecture  of  the  OFC  is  shown  in  Figure 
7.2.  It  is  a Share  Nothing  architecture  consisting  of  a set 
of  processing  nodes  (PNs)  connected  through  two 
interconnection  networks  (ICNs)  . Each  processing  node 
consists  of  an  A-PN  and  a D-PN.  A-PN  is  the  part  of  the 
processing  node  which  stores  and  processes  association  data 
(A-data)  of  a class.  D-PN  is  the  part  which  stores  and 
processes  the  descriptive  data  (D-data)  . The  separation  of 
a PN  into  A-PN  and  D-PN  is  due  to  the  fact  that  in 
different  processing  phases  of  the  OFC,  the  required  data 
and  processing  capabilities  are  different.  Both  the  A-PN 
and  D-PN  are  actually  logical  processing  nodes  which,  if 
necessary,  can  be  implemented  by  one  or  more  physical 
processors  and  storage  units.  Direct  local  communication 
channel  is  provided  between  each  pair  of  corresponding  A-PN 
and  D-PN.  The  direct  channel  is  chosen  since  the  A-  and 
D-data  of  a class  are  frequently  referenced  together  during 
query  processing. 

ICN1  is  the  interconnection  network  which  connects  all 
the  A-PNs  and  ICN2  is  used  to  connected  all  the  D-PNs.  The 
reason  to  have  two  separate  ICNs  is  stated  as  follows.  The 
goal  of  the  OFC  architecture  is  to  use  multiple  processors 
to  efficiently  support  OFC  query  processing.  Within  the 
OFC  two-phase  PQPS,  the  first  phase  is  mainly  to  identify 
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objects  of  interest  and  to  relate  objects  in  different 
E-classes.  Operations  of  the  different  levels  in  a query 
tree  (i.e.,  object  flow  tree)  can  be  processed  in  a 
pipelined  fashion.  The  amount  of  data  need  to  be  processed 
is  small  (i.e. , OXDs) . in  the  second  phase,  messages  are 
sent  to  the  identified  objects  to  perform  system-defined 
operations  such  as  retrieval,  update,  etc.,  and/or 
user— defined  operations.  in  this  phase,  the  amount  of  data 
to  be  processed  can  be  very  large,  compared  to  the  data  to 
P^ocsssed  in  the  first  phase.  Because  of  the  different 
requirements  in  two  phases,  an  architecture  with  two  ICNs 
is  chosen.  The  appropriate  capacity  of  each  ICN  and  the 
processing  power  of  a PN  required  for  each  phase  need  to  be 
carefully  analyzed  in  order  to  support  the  two-phase  PQPS 
properly.  Evaluation  should  be  performed  on  the  two 
processing  phases  individually  to  determine  the  most 
suitable  interconnection  structure  for  each  processing 
phase.  We  stress  here  that  some  DB  machine  designs  simply 
select  a certain  interconnection  structure  (such  as 
hypercube)  for  the  machine's  architecture  without  any 
evidence  to  show  the  necessity  of  using  it.  In  our  study, 
we  carefully  evaluate  the  system's  performance  under 
^ ^ ^ ^ e r conditions  to  determine  the  appropriate 
interconnection  for  each  processing  phase. 
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In  the  following,  we  shall  examine  the  performance  of 
the  OFC  query  processing  strategy  on  different  ICNs  which 
are  of  high,  medium,  and  low  degrees  of  connectivity, 
respectively,  and  study  their  performances.  The  dollar  cost 
of  the  ICNs  are  also  considered  in  the  evaluation.  The  one 
that  has  the  best  cost/performance  ratio  will  be  selected 
as  the  OFC  system  architecture.  Based  on  the  evaluation 
result,  minor  modification  may  be  made  to  that  architecture 
to  further  improve  the  performance. 

7 • 3 Performance  Evaluation  of  the  OFC  System 

As  discussed  earlier,  the  OFC  architecture  contains  a 
network  of  processing  nodes  (PNs)  interconnected  by  two 
ICNs.  Each  PN  consists  of  a set  of  processors  and  storage 
units  (i.e.,  main  memory  and  secondary  storage).  To  find  a 
suitable  architecture  for  the  OFC,  the  decisive  factors  are 
the  types  of  ICNs  interconnecting  the  PNs,  the  number  of 
PNs,  the  number  of  processors  and  storage  units  in  each  PN, 
and  the  type  of  interconnection  among  those  processors  and 
storage  units  within  a PN,  etc.  Values  of  these  variables 
can  also  be  dependent  on  other  non-architectural  parameters, 
such  as  database  size,  the  number  of  classes,  the  granule  of 
parallelism,  and  so  on.  The  common  method  to  determine  the 
optimal  values  for  these  variables  is  either  by  performing 
analysis  using  an  abstract  model 


or  by  performing 
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simulation  on  the  entire  OFC  system.  Analytic  approach  is 
inexpensive  in  terms  of  effort,  but  the  result  is  not  as 
reliable  as  the  result  of  simulation.  The  simulation 
approach,  however,  is  too  costly,  in  terms  of  time,  to 
simulate  every  detail  for  a complex  system  such  as  the  OFC. 

In  our  evaluation,  we  take  a combination  of  the  above 
two  approaches.  More  specifically,  the  evaluation  of  the 
OFC  system  is  based  on  a simulation  system  that  is  built  on 
top  of  an  analytic  model.  The  simulation  system  simulates 
the  concurrent  multiple  query  processing  on  the  global  OFC 
architecture  level , and  the  analytic  model  analyzes  the 
cost  of  processing  data  locally  stored  in  each  PN. 

Consider  that  there  are  a number  of  queries  to  be 
processed  simultaneously  in  the  OFC.  Before  execution, 
each  of  these  queries  is  decomposed  into  an  object  flow 
tree  of  operations  and  mapped  onto  the  OFC.  As  a result, 
each  PN  may  need  to  perform  several  different  operations 
from  different  queries.  For  each  operation,  the  processing 
time  spent  for  each  data  block  in  a PN  is  computed  by  using 
analytic  method.  Output  of  an  operation  is  transferred  to 
other  PN (s)  through  the  ICN  for  the  next  stage  processing. 
A simulation  program  is  written  to  simulate  the  concurrent 
multiple  query  processing  scenario  within  the  OFC.  This 
evaluation  is  expected  to  lead  us  to  find  a promising  global 
architecture. 
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l-n  this  section,  we  introduce  a DAtabase  Computer 
Performance  Evaluation  System  (DACPES)  that  was  used  in  our 
evaluation.  Details  of  each  of  the  modules  within  the 
DACPES  are  described  in  the  following  subsections.  Then, 
an  analytic  model  for  PN  local  data  processing  is 
introduced. 

• 1 — The  Database  Computer  Performance  Evaluation  System 

The  architecture  of  a DAtabase  Computer  Performance 
Evaluation  System  (DACPES)  is  shown  in  Figure  7.3. 
Conceptually,  it  is  a simulation  program  consisting  of  the 
following  modules:  user  specification,  benchmark 
specification,  generation  of  database  schema  and  gueries, 
generation  of  system  architectures,  mapping  of  DB  schema 
(i.e.  , data  placement)  and  queries  onto  system 
architectures,  and  query  execution.  An  overview  of  this 
system  are  depicted  as  follows.  Given  some  user  specified 
parameters  and  a well-established  benchmark  (to  be  described 
later) , a few  candidate  system  architectures  of  different 
degrees  of  connectivity  can  be  generated.  An  experimental 
DB  schema  and  a set  of  queries  are  also  generated.  For 
reason  of  generality,  the  DB  schema  is  randomly  generated 
rather  than  using  any  particular  schema.  In  the  next  step, 
the  generated  DB  schema  and  queries  are  mapped  onto  the 
candidate  architectures  for  execution.  Since  our  goal  is 
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to  find  a suitable  architecture  for  each  of  the  query 
processing  phases,  all  combinations  of  the  candidate 
architectures  are  used  in  the  experiment  so  that  their 
performances  are  obtained.  From  the  results,  we  select  the 
most  cost-effective  combination  as  the  OFC  system 
architecture.  in  the  following,  we  discuss  in  detail  the 
functions  and  design  issues  (if  any)  for  each  of  the 
modules . 

7 . 3 . 1 . 1 — The  user  specification  module 

The  program  starts  with  some  interactions  with  a 
designer  about  the  parameters  used  in  this  evaluation  run. 
The  parameters  include  the  number  of  classes  in  a database, 
the  query  arrival  rate,  the  number  of  queries  to  be 
generated,  the  number  of  PNs  in  the  system  architecture, 
the  bandwidth  of  the  communication  channel  in  each  type  of 
architecture,  and  the  processing  power  (to  be  defined  later 
in  Section  7. 3. 1.3),  etc.  These  parameters,  along  with  the 
benchmark  specification,  will  be  used  to  generate  a DB 

schema,  a set  of  queries,  and  also  to  generate  the 
candidate  architectures. 

7 • 3 • 1 • 2 Benchmark  specification  module 

In  this  evaluation,  we  adopt  a well-established 
benchmark  [BOR84]  (multiple  user  version)  for  evaluating  the 
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performance  of  our  system.  However,  minor  modifications  are 
made  to  that  benchmark  to  adapt  it  to  our  use.  This 
benchmark  is  stored  in  a file  external  to  the  evaluation 
system.  It  is  mainly  used  to  generate  a set  of  queries  for 
execution  in  the  OFC.  In  the  following,  we  briefly 
describe  the  benchmark,  and  the  modifications  made  to  it. 

As  mentioned  in  the  original  benchmark  [BOR84],  three 
principal  factors  that  affect  database  system  performance 
in  a multi-user  environment  are: 

- multiprogramming  level 

- query  mix 

- degree  of  data  sharing 

Each  of  these  factor  defines  an  axis  in  the  performance 
space  that  can  be  varied  independently.  Multiprogramming 
level  in  the  original  benchmark  [BOR84]  was  defined  as  the 
number  of  queries  being  executed  concurrently.  But 
"executed"  in  that  paper  is  used  in  a broad  sense  in  that 
all  of  the  phases  a query  passes  through  are  included. 
These  phases  are  parsing,  access  planning,  execution,  and 
final  result  output.  In  our  case,  we  interpret  the 
multiprogramming  level  in  a strict  sense  to  include  only 
those  queries  currently  in  the  execution  phase.  The  reason 
is  that  the  other  phases  are  performed  either  in  host 
computer  or  in  an  interface  controller.  They  contribute 
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or  nothing  to  the  determination  of  type  of  ICNs  to 
be  used  among  processors. 

As  for  query  mix,  we  divided  the  queries  into  four 
types  as  in  the  benchmark  [BOR84],  based  on  the  extent  of 
consumption  of  two  main  system  resources:  CPU  cycles  and 
disk  bandwidth.  The  four  types  of  query  mix  are  as  follows: 


query  type  CPU  util 


Disk  util 


I 

Low 

II 

Low 

III 

High 

IV 

High 

Low 

High 

Low 

High 


Examples  of  each  type  of  queries  can  be  found  in  the 
benchmark  [BOR84]  and  are  also  listed  in  below. 


query  type 


example 


Select  1 tuple  from  10,000  tuples,  using 
a clustered  index 

Select  100  tuples  from  10,000  tuples 

Join  10,000  tuples  with  1,000  tuples 
over  a clustered  attribute  and  produce 
1,000  tuples 

Select  1,000  tuples  from  10,000  tuples 
Select  1,000  tuples  from  another  10,000 
tuples 

Join  two  1,000  tuple  relations  to  form  a 
1,000  tuple  relation  which  is  then  joined 
with  another  1,000  tuple  relation 


The  join  operations  in  their  benchmark  are  assumed  to  be 
replaceable  by  special  joins  in  our  case,  with  an  identical 
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frequency  of  occurrence  for  the  SJ,  RJ,  and  JN  operations. 
The  percentage  of  occurrence  of  each  type  of  queries  is  70% 
for  type  I,  10%  for  each  of  the  other  types.  Due  to  the 
fact  that  the  OSAM*  data  model  is  mainly  designed  for 
supporting  advanced  applications  in  which  databases  are 
more  complex  both  structurally  and  semantically,  we  also 
test  some  cases  that  have  a higher  occurrence  percentage 
for  complex  queries  (such  as  query  type  III  and  type  IV)  to 
investigate  the  system  performance  in  those  environments. 
The  cases  we  tested  include  40%  of  type  I query  and  20%  for 
the  other  three  types,  and  25%  for  each  type  of  queries. 
r^e  types  of  query  mix  investigated  in  our  study  are 
summarized  as  follows: 

type  of  percentage  of  each  type  of  queries 
query  mix  type  I type  II  type  III  type  IV 


LCQM  70%  10%  10%  10% 

MCQM  40%  20%  20%  20% 

HCQM  25%  25%  25%  25% 

where  LCQM,  MCQM,  and  HCQM  stand  for  low,  medium,  and  high 
complexity  query  mix,  respectively. 

According  to  the  original  benchmark  [BOR84],  there  are 
two  relation  sizes,  1,000  tuples  and  10,000  tuples.  The 
numbers  of  output  tuples  for  each  query  of  type  I,  II, 
III,  and  IV  are  1,  100,  1000,  and  1000,  respectively.  Each 
output  tuple  may  have  2 to  4 attributes.  The  size  of  a 
base  tuple  is  defined  as  182  bytes  in  the  benchmark. 
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We  do  not  consider  data  sharing  in  our  evaluation  for 
simplicity  reason.  The  relative  performances  of  query 
execution  using  different  architectures  can  be  explicitly 
determined  without  considering  the  factor  of  data  sharing. 
Therefore,  we  defer  the  examination  of  effect  of  data 
sharing  on  performance  until  a later  design  stage  when 
refinement  of  the  OFC  architecture  is  anticipated. 

7 • 3 • 1 • 3 Generation  of  system  architecture  module 

In  this  module,  candidate  system  architectures  are 
generated  for  later  use  in  the  simulation  process.  Values 
of  relevant  parameters  in  generating  the  architectures  are 
given  as  user  specifications.  These  parameters  include  the 
number  of  PNs,  the  processing  power  of  each  PN,  and  the 
bandwidth  of  communication  channels,  are  given  as  user 
specifications.  In  the  following,  we  discuss  the  issues 
related  to  the  selection  of  the  architectures  used  in  our 
evaluation. 

Conceptually,  the  OFC  architecture  contains  a set  of 
PNs  and  some  interconnection  network  interconnecting  these 
PNs.  The  PNs  store  and  process  the  database  and  the  ICN 
provides  a wide  enough  bandwidth  for  data  transmission 
during  query  processing.  We  have  mentioned  earlier  that  our 
goal  of  the  OFC  architecture  design  is  to  find  the  most 
suitable  architecture  to  support  each  phase  of  the 


query 
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processing.  Suppose  that  (1)  the  OFC  system  operates  in  a 
user  environment,  (2)  the  accesses  to  relations 
(classes)  by  the  users  are  randomly  distributed,  and  (3) 
each  PN  stores  one  class  of  objects.  Since  a query  graph 
is,  in  general,  a sub-graph  of  the  schema  graph,  the  best 
topology  of  the  system  architecture  is  the  one  that  can 
perfectly  embed  the  schema  graph  without  distortion.  A 
formal  definition  of  the  perfect  embedding  is  given  as 
follows.  A perfect  embedding  of  a graph  G in  a graph  H is 
that  in  the  mapping  of  G onto  H,  each  pair  of  adjacent  nodes 
(or  vertices)  of  graph  G,  after  embedding,  are  still 
adjacent  in  H.  By  doing  this,  the  system  architecture  can 
exactly  matches  any  types  of  query  graphs  so  that  not  only 
the  processing  of  queries  can  be  automatically  distributed 
among  the  PNs  of  a system,  but  data  transfers  among  PNs  are 
also  dispersed  on  the  provided  communication  links. 

On  the  other  hand,  a system  with  less  communication 


links  but  having  a high  communication  bandwidth  may  still 
give  good  performance.  Because  of  this,  the  architectures 
chosen  for  this  study  should  be  representative  of  the 
different  degrees  of  connectivity  and  communication 
bandwidths . In  the  evaluation,  we  choose  hypercube 
(dimension  to  be  determined  later) , mesh  (2-D) , and  bus 
architectures  to  represent  high,  medium,  and  low 
connectivity  systems,  respectively. 


The  key  features  of 
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these  3 ICN  architectures  to  be  studied  are  communication 
bandwidth  and  degree  of  connectivity  and  their  effect  on 
performance.  They  are  summarized  as  follows: 


ICN  type  | 

comm.  | 

degree  of 

1 

bandwidth  | 

connectivity 

hypercube | 

low  | 

high 

mesh  | 

low  | 

medium 

bus  | 

high  | 

low 

More  specifically,  higher  degree  of  connectivity  provides 
more  communication  channels  to  support  data  processing  in 
the  database  schema  level.  We  assume  that  a hypercube  graph 
with  a high-enough  dimension  can  always  perfectly  embed  a 
schema  graph  so  that  data  transfers  within  a single  query 
is  always  contention-free.  Bus  system  represents  a case 
that  contention  always  exists,  and  mesh  the  case  in 
between.  On  the  other  hand,  bus  in  general  has  a much 
higher  communication  bandwidth. 

Another  reason  for  choosing  these  three  ICNs  for 
evaluation  is  related  to  feasibility,  meaning,  these  ICNs 
ar”e  general—  purpose  and  commonly  used  in  distributed 
computing  environment.  Standard  communication  protocols 
for  these  ICNs  are  available.  This  is  one  of  the  important 
considerations  in  early  stages  of  a design,  since  the 
result  of  the  study  should  guarantee  to  be  technically 
implementable. 
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Another  parameter  to  be  considered  in  the  evaluation  of 
the  ICN  architectures  is  the  BW_ratio.  This  is  a new 
parameter  introduced  in  the  study  to  represent  the  ratio  of 
bandwidths  (BWs)  between  different  ICNs  taking  into  account 
the  difference  in  dollar  cost.  For  example,  if  the  total 
cost  to  be  spent  on  the  interconnection  network  is  fixed, 
then  the  bandwidth  of  bus  would  be  much  higher  than  those  of 
cube's  or  mesh's  with  the  same  cost.  This  is  true  since  the 
bus  system  needs  only  one  physical  communication  channel 
(versus  a large  number  of  channels  in  a cube  and  mesh) . In 
the  evaluation,  we  assume  there  are  two  levels  of 
communication  bandwidths  x and  y,  and  define  a term  BW  ratio 
as  x:y,  where  x is  the  bandwidth  of  each  channel  between  a 
pair  of  nodes  in  cube  and  mesh  systems,  and  y is  the 
bandwidth  of  the  bus  communication  system.  In  this 
definition,  we  have  assumed  for  simplicity  that  the 
bandwidths  of  each  individual  channel  in  cube  and  mesh 
system  are  the  same  (or  at  the  same  level)  for  the  same 
cost.  In  this  study,  system  throughput  are  evaluated  for 
three  different  BW_ratios:  1:1,  1:10,  and  1:100.  That  means 
if  an  ethernet  (whose  bandwidth  is  10  Mbits/sec)  is  used 
for  bus  communication,  and  if  a BW  ratio  of  1:1  is  assumed, 
then  the  bandwidth  of  each  individual  channel  in  cube  and 
mesh  systems  is  also  10  Mbits/sec.  If  a BW_ratio  of  1:10 
is  used,  then  the  bandwidth  of  each  individual  channel  in 
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cube  and  mesh  is  1 Mbits/sec.  By  using  different  BW_ratios 
in  the  evaluation,  a reasonable  performance  comparison 
(considering  not  only  bandwidth  of  ICN  but  also  the  dollar 
investment)  among  the  three  architectures  can  be  obtained. 

Finally,  we  assume  for  fairness  reason  that  the  number 
of  PNs  used  in  comparing  different  ICNs  are  the  same,  and 
each  PN  has  the  same  processing  capability.  As  for  the 
architecture  of  processors  and  disk  units  inside  each  PN, 
we  take  a top-down  design  approach  to  achieve  a best  design 
for  each  PN.  In  other  words,  at  the  current  stage,  we 
examine  the  global  ICN  among  PNs  and  ignore  the  detail 
organization  inside  a PN  by  simply  assuming  that  each  PN  is 
modeled  as  a single  logical  processor  and  a single  logical 
disk  unit.  The  power  of  this  processor  and  disk  unit  can 
vary,  say,  from  5 MIPS  and  2 0 ms  random  access  time  to  50 
MIPS  and  2 ms  (10  times  more  powerful).  We  define  a 
conceptual  term,  processing  power  (PPW) , to  represent  this 
abstract  concept.  Thus,  the  communication  contention  issue 
inside  a PN  is  purposely  ignored  here.  The  rationale  behind 
this  is  that  an  optimal  PPW  should  be  obtained  before  a 
detailed  PN  architecture  can  be  designed.  In  our  future 
design,  the  obtained  optimal  PPW  will  be  used  as  a goal  to 
be  achieved  in  the  design  of  PN  architecture. 
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!-:  3 • 1 • 4 Generation  of  DB  schema  and  queries  module 

This  module  generates  a DB  schema  and  a set  of  queries 
based  on  the  previously  defined  benchmark.  The  generated 
schema  and  queries  are  stored  in  a file  for  later 
simulation  of  query  execution. 

Schema — Generation.  For  fairness  reason,  it  is  better 

to  compare  the  same  database  (with  the  same  schema)  on  the 
candidate  architectures.  However,  the  obtained  result  can 
be  dependent  greatly  on  the  characteristics  of  the  selected 
database,  e.g.,  the  number  of  E-classes,  the  number  of 
associations  existed  in  each  E-class,  etc.  It  is  therefore 
important  to  understand  the  features  of  OSAM*  schema  in 
general  so  that  the  evaluation  can  be  based  on  a general 
database.  By  studying  several  databases  modeled  by  the 
OSAM*,  we  found  that  the  most  obvious  and  important  factor 
that  directly  affects  the  distribution  of  queries  on  a DB 
schema  is  the  average  number  of  association  links  on  each 
class.  This  value  is  found  to  be  2 to  4 in  average.  It 
is  rarely  the  case  that  each  E-class  of  a database  may  have 
an  average  number  of  associations  outside  that  range. 
Another  factor  that  affects  the  formalization  of  a DB 
schema  is  the  number  of  E-classes  existed  in  a DB.  This 

number  is  given  as  a part  of  the  user  specifications.  Note 
that  the  type  of  association  does  not  play  an  important 
role  in  generating  a query  (which  can  be  seen  in  the  example 
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in  Section  5.2)  based  on  a given  schema.  Therefore,  in 
schema  generation,  we  do  not  consider  the  types  of 
associations  that  exist  among  E-classes.  Based  on  these 
observations,  a general  database  schema  is  generated  as 
follows.  The  number  of  E-classes  are  given  by  the  user. 
The  associations  among  the  E-classes  are  generated  based  on 
the  set  of  randomly  generated  queries.  Specifically, 
relationships  (associations)  between  two  E— classes  are 
formed  in  the  schema  when  data  (objects)  from  these  two 
E-classes  are  accessed  together  in  a query. 

Query — Generation . For  the  generation  of  queries,  the 
program  selects  a pre-compiled  query  (in  the  form  of  query 
tree)  from  one  of  the  four  types  of  pre-defined  queries. 
The  "selection"  is  based  on  the  percentage  of  occurrence  of 
the  four  types  of  queries  given  in  the  query  benchmark. 
The  E-class (es)  retrieved  by  a query  is  randomly  selected 
from  the  given  set  of  E-classes.  Each  query  is  described 
by  a set  of  operations.  Each  operation  is,  in  turn, 
represented  by  the  locations  and  sizes  of  its  input 

operands  and  the  output  size  and  destination  for  the  next 
operation. 

As  for  the  query  inter-arrival  time,  an  exponential 
distribution  (which  is  commonly  used  in  modeling  real-world 
customer  incoming/service  rates)  is  used  and  expressed  in 
the  following  formula: 
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F (t ) = exponential  distribution  function 
= 1 - e~r*t 

where  r is  the  average  query  arrival  rate,  and  t is  the 
query  inter-arrival  time.  Since  F(t)  can  be  any  number 
between  0 and  1, 

F (t)  = rand' 

= any  random  number  between  0 and  1 


or 


1 - e-r*t  = rand' 
e-r*t  = 1 - rand' 

= rand 

where  rand  is  another  random  number  between  0 and  1.  So, 

-1 

t = *ln (rand) 

r 

By  using  this  equation,  the  query  inter-arrival  times  are 
randomly  generated  which  satisfy  the  exponential 

distribution  assumption. 

Note  that  since  the  randomly  generated  schema  needs  to 
be  perfectly  embedded  in  a hypercube  for  evaluation  as 
mentioned  in  Section  7. 3. 1.3,  the  dimension  of  the 
hypercube  should  be  at  least  equal  to  the  maximum  number  of 
association  links  connected  to  any  E-class  of  the  database. 


101 


Since  the  average  number  of  association  links  of  each 
E-class  should  be  equal  to  or  less  than  4,  we  generate  the 
schema  on  a 4-D  hypercube  by  considering  each  PN  as  an 
E-class.  In  this  way,  we  can  guarantee  the  following  two 
things:  (1)  the  generated  schema  (no  matter  what  it  is)  can 

be  perfectly  embedded  in  the  hypercube,  and  (2)  the  average 
number  of  association  links  of  each  E-class  can  never  be 
more  than  4 . 

One  may  inquire  at  this  point  the  question,  what  if  the 
number  of  E-classes  is  greater  than  16,  the  maximum  number 
°f  E— classes  contained  in  a 4— D cube?  In  this  situation,  a 
higher  dimensional  hypercube  is  necessary.  However,  we 

ar9ue  that  an  evaluation  on  4— D cube  is  still  appropriate  to 
represent  the  evaluation  on  a higher  dimensional  cube.  To 
illustrate  this  point,  we  define  two  terms,  Average  Query 
Density  (AQD)  and  Average  Query  Complexity  (AQC) , as 
follows : 

query  leaving  rate 

AQD  

total  number  of  association  links 

= average  query  leaving  rate  for  each 
association  link 
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AQC(K) 


type  K query  leaving  rate 

total  number  of  association  links 

= average  type  K query  leaving  rate  for  each 
association  link 


where  K=I,  II,  III,  or  IV. 

Note  that  both  of  these  two  terms  are  defined  against 
each  association  link.  For  a higher  dimensional  cube 
(e.g. , 7-D)  with  each  node  of  the  cube  having  only  4 of  its 
7 links  to  the  other  nodes  (the  other  3 are  not  used  since 
an  E— class  in  the  schema  in  averages  less  than  or  equal  to  4 
association  links)  , if  this  7-D  cube  has  an  equal  AQD  and 
AQC(K)  for  K— I , . . , iv,  with  those  of  a 4— D cube,  then  the 
dynamic  loads  of  each  PN  and  each  communication  link  should 
be  homogeneously  distributed  throughout  the  cubes,  although 
their  dimensions  are  different.  Therefore,  the  performance 
(throughput)  of  any  local  regions  on  the  7-D  and  4-D  cube, 
respectively,  should  be  the  same.  In  other  words,  the 
evaluation  on  a "bigger"  schema  (i.e.,  more  E-classes)  can 
therefore  be  transformed  into  an  evaluation  on  the  schema 
embedded  in  a 4-D  cube.  On  the  other  hand,  if  the  average 
number  of  association  links  is  greater  than  4,  then  the 
transformation  of  evaluation  on  a higher  (than  4) 
dimensional  cube  can  not  be  applied  directly.  However,  the 
technique  for  the  evaluation  is  the  same  and,  therefore,  is 


omitted  here. 
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7 • 3 « 1 • 5 — Mapping  of  DB  schema  and  queries  onto  system 
architecture  module 

In  this  module,  the  generated  DB  schema  and  queries  are 
mapped  onto  the  selected  hypercube,  mesh,  and  bus  systems 
for  query  execution.  Since  the  schema  is  actually 
generated  on  the  hypercube  so  that  the  perfect  embedding  is 
guaranteed,  mapping  of  DB  schema  onto  hypercube  is  trivial. 

Since  the  mesh  and  bus  provide  lower  connectivity  than 
the  cube,  some  queries  that  involve  more  than  one  E-class 
(or  PNs)  may  need  to  be  mapped  onto  non-ad jacent  PNs.  As  a 
result,  routing  from  one  PN  to  another  is  necessary  for  the 
mesh  ICN.  In  this  case,  data  need  to  be  passed  through  some 
intermediate  PNs  before  arriving  at  the  destination, 
resulting  in  more  contentions  on  both  channels  and  PNs.  The 
details  for  mapping  a cube  onto  mesh  will  be  described 
shortly.  in  our  evaluation,  a shortest  path  between  two 
non-ad jacent  PNs  is  always  used.  If  several  paths  of  the 
same  distance  are  available,  the  selection  of  the  path  is 
random  so  that  the  contention  can  be  evenly  distributed, 
which  leads  to  a minimized  contention.  For  bus  system,  the 
query's  mapping  is  straightforward.  No  routing  needs  to 
be  considered.  Any  mapping  will  give  the  same  extent  of 
communication  contention  in  bus.  This  is  because  in  bus, 
communication  between  any  two  nodes  will  disable  the 
communication  among  other  nodes. 
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Note  that  in  the  mapping,  since  a query  graph  is  a 
subgraph  of  the  schema  graph,  a query  is  automatically 
mapped  onto  the  system  architecture  after  the  schema  graph 
has  been  mapped.  In  the  following,  we  only  consider  the 
mapping  of  schema  graph. 

Since  the  schema  is  generated  by  using  a query 
benchmark  and  embedded  in  the  4— D cube,  the  best  way  to 
systematically  map  the  schema  (whatever  it  is)  onto  mesh  is 
to  map  the  4-D  cube  onto  mesh  so  that  the  schema  is 
automatically  mapped.  There  are  different  algorithms  of 
mapping  a cube  onto  a mesh.  To  make  the  performance 
comparison  reasonable,  we  need  to  find  the  best  mapping, 
where  the  best  mapping  is  defined  as  follows.  For  an  edge 
in  cube,  if  its  two  terminal  nodes  are  mapped  onto  adjacent 
nodes  in  mesh,  the  distance  of  these  two  nodes  does  not 
increase.  Otherwise,  the  distance  is  increased  by  the 
number  of  extra  edges  required  to  connect  the  two  nodes  in 
the  mesh.  For  a best  mapping,  the  total  increased  distance 
for  all  edges  is  minimum.  The  performance  comparison  of 
cube  and  mesh  will  be  based  on  this  mapping. 

For  the  cube-to-mesh  mapping,  we  take  a well-known 
gray-code  mapping  strategy  [R0S81,  CHA86,  MA87].  Since  we 
assume  the  numbers  of  PNs  in  cube,  mesh,  and  bus  are  the 
same,  the  expansion  cost  in  mapping  is  a unit,  where  the 
expansion  cost  is  defined  as  follows. 


For  the  embedding  of 
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graph  G in  graph  H,  the  expansion  cost  of  the  embedding  is 
the  ratio  of  the  size  (in  number  of  vertices)  of  H to  the 
size  of  G [HON83,  CHA88 ] . This  assumption  is  to  insure  the 
fairness  in  performance  comparisons  in  different 
architectures . Under  this  assumption,  the  gray— code  mapping 
strategy  can  give  a minimum  dilation  cost,  where  dilation 
cost  is  the  maximum  distance  in  H between  the  images  of 
vertices  that  are  adjacent  in  G,  if  G is  embedded  in  H. 

The  mapping  of  cube  queries  onto  mesh  can  be 
illustrated  by  a simple  example.  Let  us  first  consider  the 
mapping  from  a ring  onto  a one— dimensional  array,  as  shown 
in  Figure  7.4.  Supposing  that  a minimum  dilation  cost  is 
required  in  the  mapping,  and  on  the  ring  there  are  six 
nodes,  labelled  one  to  six.  To  map  these  nodes  onto  the 
array,  the  minimum  dilation  is  2 (since  a dilation  of  1 is 
obviously  impossible)  . The  mapping  result  is  shown  in 

Figure  7.4(b) . Note  that  since  the  edges  of  each  face  of  a 
cube  is  a ring  (or  square)  , we  use  the  same  strategy  in 
mapping  cube  onto  mesh.  First,  each  node  of  the  cube  is 
assigned  a gray-code  address  as  shown  in  Figure  7.5(a). 
The  mapping  result  is  shown  in  Figure  7.5(b).  The  address 
of  a node  on  the  mesh  is  obtained  by  taking  the  two 
most  bits  on  the  horizontal  axis  concatenating  with 
the  two  right -most  bits  on  the  vertical  axis.  For  example, 
the  bottom-right-most  node  address  will  be  1111. 


This 
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mapping  is  obviously  minimum  in  dilation,  since  for  every 
pair  of  adjacent  nodes  on  cube,  their  corresponding  nodes  on 
mesh  have  a maximum  distance  of  2 . In  our  evaluation 
system,  this  mapping  algorithm  is  used  for  the  mapping  of 
schema  and  queries  onto  a mesh. 

7.3.1. 6 Query  execution  module 

In  this  module,  each  query  is  executed  by  using  our 
two-phase  processing  strategy.  The  first  phase  of  each 
query  is  executed  in  cube,  mesh,  and  bus  architectures 
individually.  Output  of  the  first  phase  in  each 
architecture  is  sent  to  another  three  architectures  for  the 
second  phase  execution.  Therefore,  there  are  totally  nine 
experiments  to  be  done. 

There  are  several  iteration  loops  to  control  the 
execution  progress  and  the  system  status.  The  outmost  loop 
controls  a system  clock,  which  advances  a small  time  period 
for  each  iteration.  When  the  time  of  system  clock  is  equal 
to  a query  arrival  time,  the  execution  of  that  query  starts 
and  proceeds  in  a pipelined  fashion.  This  loop  will 
terminate  when  all  the  given  queries  have  complete  their 
execution.  Inside  the  outmost  loop,  a second  loop  is  used 
to  check  and  update  the  status  of  every  PN,  including  the 
channels  between  PNs.  Note  that  a channel  is  modeled  as 
another  PN  (or  a virtual  PN)  . it  basically  does  the  same 
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thing  as  a PN  does,  except  that  its  processing  rate  for 
incoming  data  blocks  from  any  query  is  a constant. 

In  each  PN,  if  more  than  one  operations  (from  different 
queries)  are  performed  in  a node  simultaneously,  the  CPU 
time  is  assigned  to  those  operations  that  have  a higher 
priority.  The  priority  is  determined  by  the  portion  of  the 
input  operand  (s)  that  have  received  by  the  node.  The 
larger  the  portion,  the  higher  priority  the  operation  is 
assigned.  If  these  are  identical  for  two  operations,  then 
the  operand  input  rates  of  the  two  operations  are  compared. 
The  larger  one  has  a higher  priority.  If  these  are 

identical  again,  the  priority  is  arbitrarily  assigned.  For 
the  execution  of  an  operation  in  PN,  a function 

corresponding  to  that  operation  is  called.  The  returned 

value  of  this  function  call  is  the  time  spent  to  process  one 

block  of  input  data.  This  time  is  calculated  by  using  the 

equations  to  be  presented  next  in  Section  7.2.2. 

Finally,  the  average  throughput  (output)  of  the 
simulation  is  output  to  the  user. 

7.3.2  The  Analytic  Model 

In  Section  7.3.1,  we  have  presented  the  general 
concepts  and  design  issues  of  the  DACPES  system.  The 

concurrent  processing  of  multiple  queries  within  the  OFC  is 
simulated  by  using  this  simulation  system.  At  the  bottom 
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level  of  this  system,  the  processing  time  spent  for  each 
block  of  data  processed  within  a PN  is  computed  by  using  a 
cost  function  via  a function  call.  The  cost  functions  of 
the  OFC  primitive  operations  are  derived  based  on  an 
analytic  model.  In  the  following,  we  present  the  model. 

The  parameters  used  in  our  model  are  listed  in  Table 
7.1.  Their  default  values  are  given  in  the  parenthesis. 
In  this  table,  we  have  assumed  a base  file  to  be  the  file 
stored  in  the  PN  where  the  operation  is  performed,  and 
input  data  to  be  the  data  sent  from  other  PN  to  this  PN  for 
operation.  Most  of  the  workload  parameters  do  not  have  a 
default  value  because  their  values  are  dependent  on  the 
query  types.  Note  that  since  WB  (Wj)  can  be  OID  size  in 
some  special  joins,  we  will  use  either  WB  (Wj)  or  OID_size 
interchangeably  in  the  analysis  of  those  cases. 

For  some  operations  such  as  selection  and  projection, 
only  one  operand  (which  is  always  the  base  relation)  is 
necessary.  Therefore,  I and  Wj  are  nulls  in  these  cases. 
In  all  of  the  OFC  operations,  we  assume  the  output  tuples 
are  uniformly  selected  from  input  data  and  base  relation. 
In  each  time  (or  instruction)  unit,  the  CPU  can  only  move 
or  compare  a 4 byte  value.  We  also  assume  that  the  OIDs  of 
objects  in  one  E-class  are  assigned  within  a certain  range. 
The  reason  for  this  is  to  utilize  the  mapping  of  OID  values 
into  a bit  array  to  facilitate  processing  of  some  special 
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joins.  In  our  evaluation,  the  bit  array  is  assumed  to  be 
implemented  in  RAMs.  The  size  of  the  bit  array,  therefore, 
need  only  be  large  enough  to  cover  the  possible  values 
within  a range  of  OID  rather  than  all  possible  OID  values. 
The  idea  for  maintaining  OID  values  within  a range  can  be 
similar  to  the  idea  of  "tracks"  in  a disk.  If  the  size  of 
a file  is  larger  than  one  track,  then  the  file  can  be 
allocated  in  a number  of  tracks.  These  tracks  need  not  be 
consecutive  tracks.  The  OID  ranges  of  each  E-class  can  be 
maintained  by  DBMS. 

In  the  following,  we  derive  the  cost  (in  time)  required 
to  perform  OFC  operations.  In  each  operation,  we  first 
present  the  algorithm.  Then,  the  time  required  in  each 
step  of  the  algorithm  is  derived.  Ti  is  the  cost  of  step  i 
in  the  algorithm.  And  T_0P  is  the  total  cost  of  performing 
the  OP  operation. 

(i)  SL  operation: 

Algorithm 

(1)  Load  the  base  file  (if  the  data  are  clustered,  only 
load  the  part  of  interest) . 

(2)  Perform  selection  on  the  data. 

(3)  Write  selected  data  to  output  buffer. 
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Cost 

T1  = B*tseg  (if  non-clustered) 
b*tran  (clustered) 

T2=  B* (page_size/WB) * (attr_size/4 ) * (R+C) /cpu_spd 
(if  non-clustered) 

= b* (page_size/WB) * (attr_size/4) * (R+C) /cpu_spd 
(if  clustered) 

T3  = U*(page_size/Wu) *[ (Wy/4) *(R+W) ]/cpu_spd 

In  the  equations,  the  division  of  attribute  size  and  by  4 
is  because  of  our  assumption  that  each  memory  access  and 
comparison  can  be  operated  on  4 bytes. 

Since  the  loading  of  base  data  can  proceed  in  parallel 
with  the  other  two  processes,  the  total  time  to  complete 
the  operation  is 

T_SL  = max(Tl,  T2+T3) 

Note  that  the  derivation  of  time  cost  for  clustered  base 
data  is  similar  to  the  derivation  for  non-clustered  case. 
In  the  following  operations,  we  only  discuss  the  latter 
case.  The  result  of  former  case  can  be  derived  easily  from 
our  given  equations. 

For  the  following  binary  operations,  we  derive  the  cost 
in  terms  of  the  time  to  process  n input  data  blocks.  We 
the  incoming  data  blocks  to  a node  can  be 


assume  that 
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stored  in  an  input  data  buffer  (different  from  the  main 
memory)  of  that  node.  We  consider  this  assumption 

reasonable  because  the  size  of  OID,  which  is  used  in  those 
operations,  is  very  small.  For  a 1 Mbytes  of  buffer, 
1,000,000/4=  250,000  OIDs  can  be  stored  during  the 
processing.  We  also  assume  that  the  bit  array  size  is 
negligible  in  comparison  with  the  size  of  main  memory, 

since  10,000  OIDs  only  needs  a bit  array  of  size  about 
10,000  bits,  or  1.25K  bytes. 

(ii)  SJ  operation: 

There  are  two  cases  in  this  operation,  depending  on 
which  of  the  two  operands  is  a single  column  operand. 

CASE  A:  the  input  data  is  a single  OID  column. 

Algorithm 

(1)  Map  n blocks  of  input  OIDs  onto  the  bit  array. 

(2)  Load  the  base  relation. 

(3)  For  each  of  the  base  tuple,  map  its  OID  onto  the  bit 
array  and  check  the  corresponding  bit  value.  If  the 
bit  value  is  1,  the  base  tuple  is  retained.  Otherwise, 
discard  the  tuple. 

(4)  Write  join  result  to  output  buffer. 
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Cost 

T1  = n*(page_size/OID_size) * (0ID_size/4 ) * (R+W) /cpu_spd 
T2  = B*tseq 

T3  = B* (page_size/WB) * (0ID_size/4) * (R+R+C) /cpu_spd 
T4  = (n*U/I) * (Wy/4 ) * (R+W) /cpu_spd 

In  T3 , there  are  two  Rs  and  one  C for  each  OID,  because  the 
OID  needs  to  be  read  from  memory,  and  check  (i.e.,  the  C) 
with  the  corresponding  bit  retrieved  (i.e.,  another  R)  from 
the  bit  array.  In  T4,  the  fraction  (n*U/I)  is  obtained 
based  on  the  assumption  that  joinable  tuples  are  uniformly 
distributed  over  the  incoming  OIDs  and  the  base  tuples.  n 
OID  blocks  can  thus  generate  n*U/I  output  blocks  of  data. 

Since  the  processes  in  steps  1,  3,  and  4 must  be 

performed  in  serial,  and  these  processes  can  be  in  parallel 
with  step  2 , the  total  time  consumed  is 

T_SJ  = max(Tl+T3+T4 , T2) 

’ only  one  column  of  data  in  the  base  relation  is  used 
in  the  join  process 

Algorithm 

(1)  Load  the  single  column  of  base  relation. 

(2)  Map  these  base  data  (OIDs)  onto  the  bit  array. 

(3)  Map  n blocks  of  input  OID  tuples  onto  bit  array  and 
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check  the  bit  value.  If  it  is  1,  retain  the  tuple. 
Otherwise,  discard  it. 

(4)  write  joined  result  to  output  buffer. 

Cost 

T1  = B*tseq 

T2  = B* (page_size/WB) * (Wg/4 ) * (R+W) /cpu_spd 

T3  = n* (page_size/OID_size) * (0ID_size/4) * (R+R+C) /cpuspd 

T4  = (n*U/I) * (Wy/4) * (R+W)/cpu_spd 

Since  step  1 and  step  2 of  the  algorithm  can  be  performed  in 
parallel,  and  are  in  serial  with  step  3 as  well  as  step  4, 
the  total  time  cost  is 

T_S J = max(Tl,  T2)  + T3  + T4 

Note  that  in  performing  the  SJ  operation,  although 
algorithms  of  case  A and  case  B may  have  the  same  cost  in 
individual  steps,  the  total  costs  are  different  because  of 
the  different  orders  in  the  procedures. 

(iii)  RJ  operation: 

There  are  also  two  cases  for  this  operation,  one  is 
that  the  input  data  are  sorted  and  another  is  the  base  data 
is  sorted. 
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CASE  A:  the  input  data  are  sorted. 

Algorithm 

(1)  Load  base  file. 

(2)  Hash  base  file  into  buckets.  The  hash  is  based  on  a 
distribution-based  hash  algorithm;  meaning,  each 
bucket  is  assigned  to  a small  OID  value  range,  and  an 
OID  value  falling  into  that  range  is  hashed  into  the 
corresponding  bucket.  The  result  of  this  step  is  that 
OID  values  in  the  buckets  are  externally  in  order,  but 
not  yet  internally.  For  ease  of  calculation,  we 
assume  the  bucket  size  is  about  egual  to  the  page  size 
(This  is  feasible  by  varying  the  OID  value  range  to  be 
assigned  to  each  bucket) . 

(3)  Unload  hashed  buckets  that  can  not  be  kept  in  memory 
back  to  disk  and  leave  as  many  buckets  in  memory  as 
possible. 

(4)  Sort  buckets  of  data  in  memory  and  merge  the  sorted 
data  with  input  data. 

(5)  Write  the  join  result  to  output  buffer. 

Cost 

T1  = B*tseg 

T2  = (B*page_size/WB)*[ (Wb/4) *(R+C+W) ]/cpu_spd 

T3  = max ( 0 , B-M) *tseq 
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As  for  T4,  the  cost  is  derived  as  follows.  Since  joinable 
tuples  are  assumed  to  be  uniformly  distributed,  n input 
blocks  of  data  should  consume  (merge- join  with)  n*B/I 
blocks  of  base  file  data.  There  are  two  sub-cases  in 
calculating  the  time  required  to  complete  this  step. 

subcase  1;  M >=  n*B/l,  then  there  is  no  need  to  load 
data  from  secondary  storage.  n*B/l  blocks  of  base  data  can 

be  found  in  memory,  and  sorted  and  merge- joined  with  input 
data. 


subcase  2 : M < n*B/I,  then  M blocks  of  base  data  can 

be  found  in  memory.  The  other  (n*B/I  - M)  blocks  need  to 
be  loaded  from  disk.  The  sort -merge  join  can  be  performed 
afterwards . 

The  processing  cost  of  these  two  subcases  can  be 

wr-^^en  -*-n  one  equation  as  follows.  First  we  define  the 
terms : 


Tsl  = sort  base  data  that  are  kept  in  memory 
^s2  = s°^t  the  other  n*B/I  - M blocks  of  data,  if 
necessary 

— load  n*B/I  — M blocks  of  data,  if  necessary 

— mer<?e— join  the  input  data  with  base  data 


Note  that  the  base  data  after  hashing,  are  already 
externally  in  order.  In  this  stage,  we  sort  them  in  the 
unit  of  a block.  Supposing  the  cost  of  sorting  is  K*log(K) 
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reads,  comparisons,  and  writes  for  K data  items. 
Therefore, 

Tsl  = min (n*B/I , M) * [ (page_size/WB) *log (page_size/WB) ] 

* [ (Wb/4) * (R+C+W) ]/cpu_spd 

ts2  = max(0,  n*B/I-M) * [ (page_size/WB) *log (page_size/WB) ] 

* [ (WB/ 4 ) * (R+C+W) ] /cpu_spd 
Tl  = max ( 0 , n*B/I-M) *tseq 

tM  = [n* (Page_size/Wj) +n* (B/I) * (page_size/WB) ] 

* (0ID_size/4 ) * (R+C+W) /cpu_spd 

Since  the  loading  of  base  data  can  be  in  parallel  with  the 
sorting  and  merge- j oining  processes,  the  time  required  in 
this  step  is 

T4  = max(Tsl+Ts2+TM,  Tl) 

As  for  step  5,  it  is  obvious  that 

T5  = n*(U/I) *(page_size/Wu) *(Wa/4) * (R+W) /cpu_spd 

Since  step  1 and  step  2 can  be  in  parallel,  and  so  are  step 

3 and  step  4,  the  total  processing  cost  for  n input  blocks 
of  data  is 


T_RJ  = max (Tl , T2)  + max(T3,  T4)  + T5 
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CASE  B:  the  base  data  are  sorted 
Algorithm 

(1)  Sort  n blocks  of  input  data. 

(2)  Load  base  file. 

(3)  Merge- join  input  data  with  base  data. 

(4)  Write  join  result  to  output  buffer. 

Cost 

T1  = n*(page_size/WI)  *log[n*(page_size/WI)  ] 

* [ (wi/4) * (R+C+W) ]/cpu_spd 
T2  = B*tseq 

T3  = [n*(page_size/WI)+B*(page_size/Wu) ] 

* (0ID_size/4 ) * (R+C+W) /cpu_spd 
T4  = n*(U/I)*(page_size/Wu)*(Wu/4)*(R+W)/cpu_spd 

Since  loading  of  base  file  can  be  in  parallel  with  three 
sequential  steps,  i.e.,  Tl,  T3 , and  T4 , the  total  processing 
cost  is 

T_RJ  = max(T2,  T1+T3+T4) 

(iv)  RSJ  operation: 

For  RSJ  operation,  there  are  several  possible  subcases 
depending  on  the  features  of  the  two  input  operands.  First, 
if  the  RSJ  operation  is  used  in  the  first  phase  without  the 
restriction  that  output  data  have  the  same  OID  order  as  the 
single  column  input  of  unsorted  OIDs  (refer  to  Section  4.3), 
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then  it  is  the  same  as  an  SJ  operation  except  that  one  of 
it's  input  operands  is  sorted.  Under  this  condition,  there 
are  four  subcases  as  shown  in  the  following. 

- The  input  data  are  single  column  of  sorted  OIDs. 

- The  input  data  are  single  column  of  unsorted  OIDs  and 

the  base  data  are  in  sorted  order. 

The  input  data  are  multiple  columns  of  OIDs  in  sorted 
order. 

- The  input  data  are  multiple  columns  of  unsorted  OIDs 

the  base  data  are  in  sorted  order. 

The  analyses  for  these  four  subcases  are  similar  to  those  in 
SJ  and  RJ  operations.  For  conciseness  in  presentation,  they 
are  omitted  here. 

Next,  we  present  the  algorithm  and  analysis  for  the  RSJ 
used  in  the  second  phase  of  query  processing,  i.e.,  the 
input  data  are  single  column  of  unsorted  OIDs  and  the  base 
file  is  sorted,  and  with  the  restriction  that  output  data 
are  in  the  same  order  as  the  single  input  column  of  unsorted 
OIDs. 

Algorithm 

(1)  Map  n blocks  of  input  OIDs  onto  bit  array. 

(2)  Load  base  file. 

(3)  For  each  of  the  base  tuple,  map  its  OID  onto  bit 
array.  If  the  corresponding  bit  value  is  1,  then  keep 
the  tuple.  Otherwise,  discard  it. 
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(4)  For  each  input  OID,  do  binary  search  in  the  selected 
base  tuples  to  match  the  base  tuple  with  its  OID.  The 
number  of  comparisons  (including  a read  and  a 
comparison  operations)  required  for  each  input  OID  in 
the  search  process  is  log(k),  if  there  are  k data 
items  to  be  searched. 

(5)  Write  the  result  to  output  buffer. 

Note  that  in  step  3,  we  expect  the  selected  tuples  can  be 
retained  in  main  memory.  in  other  words,  we  have  the 
constraint 

n* (U/I)  <=  m 


or 


n <=  M*I/U 

This  tells  us  that  we  can  not  map  more  than  M*I/U  blocks  of 
input  data  onto  bit  array  at  one  time. 

Cost 

T1  = n*(page_size/WI) *(Wx/4) *(R+w)/cpu_spd 

T2  = B*tseq 

T3  = B* (page_size/WB) * (0ID_size/4) * (R+W)/cpu_spd 
T4  = n* [ (Wj/4) *R+log(n* (U/I) * (page_size/WB) ) 

* (0ID_size/4 ) * (R+C) ]/cpu_spd 
T5  = n*  (U/I)  * (Wjj/4)  * (R+W)/cpu_spd 
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Since  the  loading  of  base  relation  can  be  in  parallel  with 
the  processing  of  data,  the  total  processing  time  is 

T_SJ  = max (T2 , T1+T3+T4+T5) 

(v)  JN  operation: 

Algorithm 

(1)  Load  base  file. 

(2)  Hash  base  file. 

(3)  Unload  to  disk  these  hashed  buckets  that  can  not  be 
kept  in  main  memory. 

(4)  Hash  input  data  into  buckets. 

(5)  Join  two  hashed  operand  data. 

(6)  Reloadbasedata (if  necessary)  from  disk  for  join. 

(7)  Write  join  result  to  output  buffer. 

Cost 

T1  = B*tseq 

T2  = B* (page_size/WB) * (Wg/4 ) * (R+C+W) ]/cpu_spd 
T3  = max ( 0 , B-M) *tseq 

T4  = n*(page_size/WI) *(WI/4) * (R+C+W) /cpu_spd 
T5  = [n*(page_size/Wj)*(Wj/4)*F*(R+C) 

+B*(page_size/WB) *(wB/4) *R]/cpu_spd 

where  F is  a fudge  factor,  the  one  that  was  used  in  [DEW84], 
in  which  F accounts  for  the  different  types  of  increments 
in  hashing  and  sorting. 


T6  = max ( 0 , B-M) *tseq 

T7  = U* (Wy/ 4 ) * (R+W) / cpu spd 


Since  step  1 and  step  3 can  be  in  parallel  with  step  2 and 
4,  and  step  6 in  parallel  with  step  5 and  7,  the  total 
processing  cost  is 

T_JN  = max (T1+T3 , T2+T4)  + max(T6,  T5+T7) 
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Table  7.1.  Parameters  and  their  values  used  in  the 
simulation 


Workload  Parameters: 


attr_size 

OID_size 

I 

b 

B 

U 

WI 

WB 

% 


attribute  size 
OID  size  ( 4 bytes) 

- number  of  pages  of  entire  input 

-number  of  pages  that  store  the  indices  of  an 
indexed  relation  (1..2  pages) 

- number  of  pages  of  base  file 

- number  of  pages  of  produced  data 

- width  of  input  data,  in  bytes,  to  a PN 

- width  of  base  file  in  bytes 

- width  of  output  data  in  bytes 


System  Parameters: 


cpuspd 

R 

W 

C 


tran 

t-seq 

M 

page_size 


cpu  speed  (5  MIPS) 

- number  of  CPU  cycles  for  a read  of  one  word 
from  memory  (1) 

- number  of  CPU  cycles  for  a write  of  one  word 
to  memory  (1) 

- number  of  CPU  cycles  for  a comparison  on  two 
one  word  data,  or  for  a computation  such  as 
hashing  on  a one  word  value  (2.. 3) 

- disk  random  access  time  (20  ms)  per  page 

- disk  sequential  access  time  (5  ms)  per  page 

- main  memory  size  (1  Mbytes  or  250  pages) 

- page  size  (4  Kbytes) 
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ICN:  Interconnection  Network 
P : Processor 
SSU:  Shared  Storage  Unit 
SU  : Storage  Unit 

(a)  Share  Everything.  (b)  Share  Nothing. 

Figure  7.1.  The  abstract  models  for  share  everything 
and  share  nothing  architectures . 
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Figure  7.2.  A two-tier  OFC  system  architecture. 
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Figure  7.3.  The  functional  flow  chart  of  the  DACPES 
simulation  package. 
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(a) 


(b) 


Figure  7.4. 


The  minimum  dilation  mapping  for  the 
nodes  on  a ring  onto  an  array. 


(a) 


(b) 


Figure  7.5. 


The  minimum  dilation  mapping  for  the 
nodes  on  a hypercube  onto  a mesh. 


CHAPTER  8 

SIMULATION  RESULTS  AND  EXPLANATION 

For  query  processing  in  OFC,  the  first  phase  is  to 
relate  objects  through  their  semantic  relationships. 
Intuitively,  an  ICN  with  high  connectivity  should  be 

beneficial,  since  more  communication  channels  are  provided. 
However,  the  amount  of  output  data  of  each  PN  in  the  first 
phase  is  usually  very  small  (only  oiDs)  . For  the  second 
phase  processing,  the  task  is  to  access  descriptive  data 
from  the  classes.  The  amount  of  data  to  be  accessed  and 
transferred  among  PNs  can  be  very  large  (comparing  with  the 
data  in  the  first  phase) . A high  connectivity  ICN  can  also 
help  to  relieve  contention  in  this  case.  However,  the 

degree  of  benefit  of  having  more  communication  channels  for 
both  phases  is  not  obvious  and  needs  to  be  carefully 

evaluated. 

In  this  chapter,  we  perform  some  experiments  to 
determine  the  most  suitable  ICN  architecture  for  each  of 
the  two  individual  processing  phases.  As  mentioned  in 

Chapter  7,  the  experiments  are  performed  on  three  ICN 

architectures  for  each  processing  phase  which  represent 
three  different  degrees  of  connectivity.  This  simulation 
is  performed  using  the  DAtabase  Computer  Performance 
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Evaluation  System  DACPES , which  was  described  in  the 
previous  chapter.  The  metric  to  be  measured  is  OFC  system 
throughput  as  measured  by  the  number  of  queries  completed 
per  second  in  a concurrent  multi-query  environment.  In  the 
simulation,  we  vary  a set  of  parameter  values  to  obtain  the 
results  under  different  conditions.  All  combinations  of 
the  architectures  in  two  processing  phases  are  tested  under 
each  condition.  The  simulation  results  and  discussions  are 
presented  in  Section  8.1,  followed  by  a summary  of  the 
significance  of  the  results  in  Section  8.2. 

8 • 1 Simulation  Results  and  Discussion 

In  this  section,  the  simulation  results  are  presented. 
Observation  and  conclusions  derived  from  the  results  are 
discussed.  The  simulation  results  are  obtained  by  varying 
the  following  parameters:  query  input  rate  (q_rate)  , 
bandwidth  (BW) , BW_ratio,  processing  power  (PPW) , type  of 
query  mix,  and  type  of  ICN  in  the  first  and  the  second 
phase.  For  convenience,  the  BW  of  cube,  mesh,  and  bus 
interconnection  will  be  denoted  as  C_BW,  M BW,  and  B BW, 
respectively.  In  the  following,  there  are  five  set  of 
figures,  each  measuring  the  OFC  system  throughput  by 
varying  some  key  parameter  values.  For  clarity,  the  purpose 
of  showing  each  figure  is  presented  first,  followed  by  a 
discussion  on  the  results  showing  in  the  figures. 
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(A)  Figure  8.1 

The  purpose  of  Figure  8.1  is  to  show  that  the 
bottleneck  of  the  two— phase  PQPS  is  in  the  second  phase  and 
thus  the  type  of  ICN  structure  used  in  the  first  phase  does 
not  have  a large  impact  on  the  result.  The  parameters  and 
their  values  used  in  this  figure  are  q_rate=100 
(queries/sec),  BW_ratio=l : 10  (B_BW  is  10  times  greater  than 
C_BW  and  M_BW) , and  PPW=10  (Each  PPW  has  a processing  power 
of  5 MIPS,  and  a disk  random  access  time  of  20  ms/page  and  a 
sequential  access  time  of  5 ms/page).  Figure  8.1(a), 
8.1(b),  and  8.1(c)  show  the  throughput  of  the  first  phase 
processing  (second  phase  not  included)  within  the  OFC  system 
under  different  query  mixes.  The  B_BW  is  varied  at  1,  5, 
and  10  Mbytes/sec.  Hence,  the  actual  C_BW  and  M_BW  is  1/10 
of  the  B_BW  value  shown  in  the  figure  (since 
BW_ratio=l: 10) . For  example  in  the  first  column  of  Figure 
8.1(a),  the  B_BW=l  Mbytes/sec  indicates  that  the  B_BW  is  1 
Mbytes/sec  and  C_BW  and  M_BW  are  0.1  Mbytes/sec.  In  the 
figures,  we  have  used  XI  to  indicate  the  X type  of  ICN 
structure  in  the  first  phase  and  X2  for  that  in  the  second 
phase,  where  X can  be  C (cube)  , M (mesh)  , or  B (bus)  . For 
example,  Cl  indicates  using  a cube  ICN  in  the  first  phase. 
Figure  8.1(d)  shows  the  overall  throughput  (including  the 
second-phase  processing)  for  the  query  mix  that  contains 
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the  most  complex  queries  (HCQM)  . The  reason  that  we  show 
the  overall  throughput  only  for  HCQM  type  query  mix  will  be 
explained  later. 

The  observations  obtained  from  these  figures  are  as 
follows : 

(1)  From  Figures  8.1(a),  8.1(b),  and  8.1(c),  we  can 
see  that  in  general  in  the  first  phase  processing,  the 
higher  the  BW,  the  higher  is  the  throughput.  Also,  the 
higher  the  percentage  of  complex  queries,  the  lower  is  the 
throughput.  This  is  true  independent  of  the  ICN  being  used. 
Moreover,  the  higher  the  complexity  of  query  mix,  the  more 
sensitive  is  the  throughput  to  the  type  of  ICN  at  low  BW 
(B_BW=1) . 

(2)  Figure  8.1(d)  shows  the  overall  throughput  of  all 
nine  combinations  of  the  ICNs  in  two  phases  at  three 
different  B_BWs.  The  values  in  a row  represent  the  overall 
throughputs  of  using  the  ICN  in  question  in  the  first 
phase,  combined  with  each  of  the  three  ICNs  in  the  second 
phase.  Conversely,  the  values  in  a column  represent  the 
overall  throughputs  of  using  different  ICNs  for  the  first 
phase  and  the  ICN  in  question  for  the  second  phase.  In 
general,  the  overall  throughput  also  increases  when  the 
bandwidth  of  ICN  increases.  However,  the  overall  throughput 
is  much  lower  than  the  throughput  of  the  first  phase.  For 
example,  for  Cl  where  B_BW=5,  the  throughput  is  89.29  in 
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Figure  8.1(a).  Whereas  for  C1/C2  and  B_BW=5 , the  throughput 
is  54.54.  This  indicates  that  the  bottleneck  of  the 
two-phase  processing  is  in  the  second  phase.  Therefore,  no 
matter  how  fast  the  processing  is  in  the  first  phase,  the 
data  collection  in  the  second  phase  always  slows  down  the 
process  and  data  tend  to  be  accumulated  in  this  phase.  This 
is  to  be  expected  because  during  the  first  phase  processing, 
the  data  to  be  processed  are  OIDs,  which  are  very  small  in 
size  in  comparison  with  the  size  of  actual  values  to  be 
processed  in  the  second  phase.  For  illustration,  let  us 
assume  the  size  of  an  OID  to  be  4 bytes.  Thus,  each  data 
page  (4  Kbytes)  contains  1000  OIDs.  Therefore,  for  a query 
to  process,  say,  a few  thousands  of  objects,  only  a few 
pages  are  required.  Obviously,  this  will  not  cause  serious 
communication  contention  among  PNs.  However,  each  of  the 
OIDs  may  correspond  to  a complex  object  so  that  a large 
amount  of  data  needs  to  be  processed  and  transferred  among 
PNs,  causing  much  more  serious  communication  contention  in 
the  second  phase.  Thus,  the  advantage  of  using  high 
performance  cube  and  mesh  ICNs  in  the  first  phase  is 
nullified  by  the  bottleneck  in  the  second  phase.  This  leads 
to  an  important  conclusion  that  for  the  first  phase 
processing,  a simpler  and  less  costly  bus  ICN  structure  is 
adequate  for  data  transfer  during  query  processing. 
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The  same  conclusion  can  also  be  obtained  by  observing 
the  fact  that  the  value  difference  between  rows  is  smaller 
than  the  difference  between  columns.  For  example,  the 
difference  between  the  first  row  and  the  second  row  at  a 
B_BW  equal  to  5 Mbytes/sec  is  less  than  1 (e.g., 
54.54-54.12  = 0.42,  43.25-42.85  = 0.40,  46.58-46.01=0.57). 
This  reinforces  the  observation  that  the  type  of  ICN  used  in 
the  first  phase  does  not  have  much  of  an  impact  on  the 
result.  However,  the  difference  between  the  first  column 
and  the  second  column  at  B_BW=5  Mbytes/sec  is  much  larger 
(i.e.,  54.54  - 43.25  = 11.19,  54.12  - 42.85  = 11.27,  54.49  - 
43.10  — 11.39)  . This  implies  that  the  use  of  different 
ICNs  in  the  second  phase  does  have  a significant  impact  on 
the  system  throughput. 

Note  that  Figure  8.1(d)  shows  the  throughput  under  a 
HCQM  environment.  If  other  types  of  query  mixes  are  used, 
the  difference  in  overall  throughput  will  be  smaller  than 
those  shown  in  the  figure.  In  that  case,  the  type  of  ICN 
used  in  the  first  phase  has  even  less  significance.  For 
this  reason,  the  results  for  other  types  of  query  mixes  are 
not  shown . 

Based  to  the  conclusion  obtained  from  Figure  8.1,  the 
performance  results  shown  in  the  remaining  figures  are  based 
on  using  a bus  architecture  in  the  first-phase. 
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(B)  Figure  8.2 

In  Figure  8.2,  the  bandwidth  is  varied  to  determine  the 
P®^"^ ormance  of  each  ICN  type  in  phase  two  under  different 
BW_ratios.  ppw  is  also  varied  in  Figure  8.2(a).  Figures 
8.2(a),  8.2(b),  and  8.2(c)  show  different  BW_ratios  of  1:1, 
1:10,  and  1:100,  respectively.  The  horizontal  axis  of  each 
figure  is  represented  by  the  logarithm  of  B_BW.  Note  that 
when  B_BW  is  given,  then  C_BW  and  M_BW  can  be  determined  by 
using  the  BW_ratio.  The  guery  mix  is  MCQM  type,  i.e.,  40% 
of  type  I and  20%  of  each  of  the  other  types  of  queries. 
The  q_rate  is  100  queries/sec,  and  the  first-phase  ICN 
architecture  is  a bus  ICN. 

The  observations  obtained  from  these  figures  are  as 
follows : 

(1)  Figure  8.2(a)  shows  that  in  general,  the 
throughput  increases  with  an  increasing  BW.  This  is 
intuitively  obvious  since  a high  BW  can  relieve  the 
contention  in  communication. 

(2)  Figure  8.2(a)  also  shows  that  the  higher  the  PPW, 
the  higher  is  the  BW  for  the  curves  to  reach  their  bounds. 
This  is  because  a PN  with  a high  PPW  can  produce  result  at 
a higher  rate  (than  a PN  with  a lower  PPW)  , causing  more 
serious  contention  in  a communication  channel. 

(3)  When  the  PPW  is  increased  from  5 to  10  in  Figure 
8.2(a),  the  performance  increases  significantly. 


However, 
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it  does  not  increase  very  much  while  PPW  is  increased  from 
10  to  15. 

(4)  Figure  8.2(a)  also  shows  that  the  throughput  of 
bus  architecture  starts  to  increase  at  a much  higher  BW  than 
that  of  the  cube  and  mesh  system,  independent  of  the  PPW. 
This  implies  that  if  each  individual  channel  in  a cube  and 
mesh  ICN  has  a same  BW  as  a bus  system  (i.e.,  BW  ratio=l:l), 
then  the  cube  and  mesh  systems  always  greatly  outperform 
the  bus  system. 

(5)  When  the  BW_ratio  is  1:1,  as  shown  in  Figure 
8.2(a)  , the  cube  architecture  has  the  best  performance  for 
all  BWs . The  mesh  architecture  has  a better  performance 
than  the  bus  system  for  a medium  and  low  BW.  The 
performance  of  the  bus  system  is  better  than  the  mesh 
system  for  a BW  higher  than  3 Mbytes/sec  (for  PPW=5)  or  5 
Mbytes/sec  (for  PPW=10)  , and  is  getting  closer  to  the 
cube's  performance  as  increasing  the  BW.  While  BW  is  low, 
the  mesh  system  performs  better  than  the  bus  system  because 
in  mesh  system,  the  communication  load  is  distributed  among 
the  provided  communication  channels.  While  the  BW  is  high, 
bus  outperforms  mesh. 

This  result  can  also  be  obtained  by  performing  a 
qualitative  analysis  as  follows.  First,  we  classify  the 
communication  contention  into  two  types:  contention  on  a 
channel  and  contention  in  a PN  as  data  is  transferred 
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through  an  intermediate  PN  from  a source  PN  to  a target  PN. 
Each  of  these  two  types  can  be  further  classified  into 
another  two  types  of  contentions:  intra-guery  contention  and 
inter-query  contention.  The  existence  of  each  type  of 
contention  is  presented  in  the  table  below.  Note  that  the 
processor  contention  is  not  considered  here  because  the 
extents  of  processing  contention  in  three  cases  (i.e., 
cube,  mesh,  and  bus)  are  the  same. 

Communication  Contention 
system  channel  PN 

intra-query  inter-query  intra-query  inter-query 


- 1 
cube  | 

no 

yes 

no 

yes 

1 

mesh  | 

yes 

yes 

yes 

yes 

bus 

yes 

yes 

no 

yes 

We  define  the  intra-query  contention  as  follows.  At  a time 
instant,  a hardware  component  (PN  or  channel)  needs  to 
transfer  data,  which  are  generated  by  different  operations 
of  the  same  query  and  performed  in  different  source  PNs,  to 
target  PNs.  Similarly,  the  inter-query 
contention  is  defined  for  transmitting  data  that  are 
generated  by  operations  of  different  queries.  As  we  can 
see,  although  inter-query  contention  always  exists  in  any 
system,  the  cube  system  does  not  have  intra— query 
contention  (i.e.,  an  ideal  machine  in  that  sense) . Mesh 
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system  has  both  types  of  intra-query  contention,  and  a bus 
system  does  not  have  intra-query  contention  in  PN  because 
of  the  reason  previously  described. 

The  right  half  of  the  table  (PN  contention)  tells  us 
that  since  cube  and  bus  systems  do  not  have  intra-query 
contention,  the  performances  of  cube  and  bus  should  be 
better  than  mesh  as  long  as  the  communication  BW  is  not  a 
limiting  factor  during  query  processing.  Similarly,  we 
draw  the  conclusion  from  the  left  half  of  the  table  that 
cube  performance  is  better  than  the  other  systems  while 
channel  BW  has  significant  influence  on  the  performance. 

(6)  Figure  8.2(b)  shows  the  system  throughput  for  the 
BW_ratio  of  1:10  and  PPW=10.  The  rest  of  the  parameters 
are  the  same  as  in  Figure  8.2(a) . Under  these  conditions, 
the  bus  architecture  consistently  performs  the  best  within 
the  medium  range  of  BW.  Mesh  architecture  is  always  the 
worst  under  such  BW_ratio. 

(7)  Figure  8.2(c)  shows  the  system  throughput  for  the 
BW_ratio  of  1:100  and  PPW=10.  Under  these  conditions,  bus 
architecture  significantly  outperforms  the  others. 

In  summary,  if  the  system  throughput  is  tested  under 
the  condition  of  a medium  complexity  query  mix,  a PPW  of  10 
for  each  PN,  and  a q_rate  of  100  queries/sec,  then  we  have 
the  following  conclusions: 
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(i)  With  a BW_ratio  of  1:1,  cube  is  obviously  the  best 

structure. 

(ii)  With  a BW_ratio  of  1:100,  bus  is  obviously  the  best 

structure. 

(iii)  With  a BW_ratio  of  1:10,  the  result  is  mixed  and  the 
performance  of  three  structures  are  dependent  on  other 
parameters  such  as  BW. 

In  other  words,  if  the  dollar  cost  for  setting  up  the 
communication  channels  among  PNs  is  not  limited,  then  under 
our  benchmark,  a cube  architecture  is  the  best  choice  since 
it  consistently  performs  better  than  the  other  types  of  ICN 
architectures  given  the  BW_ratio  being  1:1.  if  the  cost  for 
communication  has  to  be  limited  to  an  amount  so  that  a 
BW_ratio  is  1:100  for  cube  (and  mesh)  versus  bus  is  proper, 
then  bus  will  be  the  best  choice.  If  the  costs  for  these 
ICN  structures  correspond  to  a BW_ratio  of  1:10,  then  the 
choice  of  bus  or  cube  will  depend  on  other  parameters. 

Since  the  choice  of  ICNs  is  obvious  for  BW_ratios  other 
than  1:10,  we  concentrate  only  on  the  case  of  a BW  ratio 
equal  to  1:10  in  the  following  experiments. 

( C)  Figure  8.3 

Figure  8.3  shows  the  system  throughput  for  different 
‘L rates.  The  experiment  is  performed  under  the  environment 
of  a HCQM  type  of  query  mix,  a BW_ratio  of  1:10,  a PPW  of 
10,  and  a bus  structure  as  the  ICN  in  the  first  phase.  The 
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reason  that  the  HCQM  type  of  query  mix  is  used  is  because 
for  low  q_rate  (e.g.,  50  queries/sec,  or  10  queries/sec), 
the  contention  in  communication  channels  of  any  type  of 
ICNs  is  not  serious.  As  a result,  the  performance  curves 
for  the  three  ICN  structures  will  be  indistinguishable 
(especially  for  LCQM  case) . The  observations  obtained  from 
Figure  8.3  are  as  follows: 

(1)  As  shown  in  Figure  8.3(a),  for  high  complexity 
query  mix  with  high  query  arrival  rate  and  low  bandwidth, 
the  type  of  ICNs  used  is  not  much  of  a factor  to  the  overall 
throughput . 

(2)  As  shown  in  Figure  8.3(a),  for  high  complexity 
query  mix  with  high  query  arrival  rate,  cube  is  always 
better  than  the  other  two  ICNs. 

(3)  As  shown  in  Figure  8.3(c),  for  high  complexity 
query  mix  with  low  query  arrival  rate,  the  type  of  ICNs 
used  is  not  much  of  a factor. 

(4)  Comparing  all  three  figures,  the  higher  the  query 
arrival  rate,  the  higher  is  the  bandwidth  required  to 
reach  the  performance  bounds  (i.e.,  flat  area  in  the 
right-hand  side  of  the  three  figures) . The  opposite  is  true 
for  lower  query  arrival  rate.  This  is  because  when  the 
query  arrival  rate  is  low,  only  a small  amount  of  data 
needed  to  be  processed  and  transferred 


among  PNs . 
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Consequently,  the  system  can  reach  its  upper  bound 
performance  at  a low  communication  bandwidth. 

(D)  Figure  8.4 

Figure  8.4  shows  the  throughputs  of  3 ICN  architectures 
under  LCQM  by  varying  the  bandwidth  of  the  ICN.  The  other 
parameters  values  are  the  same  as  those  in  Figure  8.2(b) 
and  Figure  8.3(a).  This  figure  is  provided  to  be  compared 
against  Figure  8.2(b)  and  Figure  8.3(a)  to  show  the  effect 
of  complexity  of  query  mix  on  the  system  performance.  The 
observations  are  as  follows: 

(1)  As  shown  in  Figure  8.4,  for  low  complexity  query 
mix,  cube  is  superior  to  bus  only  at  low  bandwidth.  For 
medium  and  high  bandwidth,  bus  performs  significantly 
better  than  cube.  Mesh  has  the  worst  performance  in  all 
cases. 

(2)  Comparing  this  figure  with  Figure  8.2(b)  and 
Figure  8.3(a)  (note  that  they  have  the  same  condition 
except  for  different  types  of  query  mix) , we  can  see  that, 
in  general,  the  less  complex  queries  to  be  processed  in  a 
system,  the  higher  throughput  will  be  obtained. 

(3)  The  performance  upper  bound  is  reached  at  a 
relatively  lower  bandwidth  in  the  case  of  a low  complexity 
query  mix,  as  shown  in  Figure  8.4.  Both  bus  and  cube 
systems  reach  their  bounds  at  a B_BW  of  3-5  Mbytes/sec  (or 
equivalently  a C_BW  (or  M_BW)  of  0.3-0. 5 Mbytes/ sec ) . 
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However,  for  high  complexity  query  mix  as  shown  in  Figure 
8.3(a),  none  of  the  curves  reach  the  performance  upper  bound 
at  the  BW  of  10  Mbytes/sec. 

(4)  In  all  three  query  mixes,  cube  is  always  better 
than  mesh.  We  also  notice  that  the  curve  of  bus 
performance  rises  faster  than  the  other  two  curves  as  the 
percentage  of  complex  queries  decreases.  For  example,  the 
curve  for  the  bus  shows  the  worst  performance  for  a wide 
range  of  BW  in  Figure  8.3(a)  (25%  complex  queries).  In 
Figure  8.2(b)  (20%  complex  queries),  bus  curve  becomes  the 
best  for  most  of  the  BWs,  although  not  significantly.  And, 
in  Figure  8.4  (10%  complex  queries),  bus  is  significantly 
better  than  the  other  two  architectures,  if  the  bandwidth  is 
not  too  low.  This  interesting  fact  tells  us  that  bus  is 
very  sensitive  to  the  degree  of  complexity  of  a query  mix. 
Under  a LCQM  environment,  bus  is  a good  choice.  As  the 
degree  of  complexity  of  a query  mix  increases,  the 
performance  of  bus  system  drops  rapidly. 

(E)  Figure  8.5 

The  performance  is  determined  by  varying  PPW,  under  the 
condition  of  MCQM  query  mix  and  q_rate  of  100  queries/sec. 
We  show  curves  of  B_BW  at  0.1,  1,  io,  and  50  Mbytes/sec 
(the  BW_ratio  is  1:10).  The  observations  are  as  follows: 

(1)  When  the  B_BW  increases  from  1 Mbytes/sec  to  10 
Mbytes/sec,  the  throughput  increases  dramatically  for  all  3 
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ICN  types.  The  increase  slows  down  when  the  BW  is 
increased  from  10  Mbytes/sec  to  50  Mbytes/sec. 

(2)  The  increase  of  PPW  results  in  significant  increase  of 
performance  up  until  PPW=10  to  15.  Beyond  which  the 
increase  in  PPW  causes  limited  increase  in  throughput  and 
is  therefore  a waste  of  resource. 

8 . 2 A Summary  of  the  Results 

To  summarize  the  above  discussions  on  the  performance 
simulation  results,  we  list  in  the  following  some  of  the 
important  observations  and  conclusions. 

1.  Under  our  benchmark  assumptions,  the  second  phase 
is  the  processing  bottleneck  for  query  processing  in  the 
OFC . In  other  words,  the  system  performance  is  affected 
greatly  by  the  type  of  ICN2  being  used,  and  is  not  by  that 
of  ICN1.  Hence,  we  may  use  the  simpler,  cheaper,  and  more 
extensible  bus  structure  as  the  ICN  architecture  for  the 
first  phase. 

2.  If  the  dollar  cost  is  a consideration  in  the  design 
of  the  ICNs,  then  the  affordable  BW_ratio  is  an  important 
factor  for  choosing  the  type  of  interconnection  network. 

In  our  study,  under  the  condition  of  our  benchmark,  if 
the  BW_ratio  is  1:1,  then  cube  and  mesh  perform 
dramatically  better  than  bus.  If  the  BWratio  is  1:10,  then 
cube  is  best  only  for  low  BW,  and  bus  outperforms  cube  and 
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mesh  for  high  BW.  If  the  BW_ratio  is  1:100,  then  bus  is 
definitely  the  best  choice. 

3.  In  general,  the  system  performance  is  always 

improved  by  increasing  the  BW  of  ICN.  However,  the 

performance  will  reach  the  upper  bound  at  a certain  BW. 
The  higher  the  query  arrival  rate,  the  higher  is  the  BW 
required  to  reach  the  performance  upper  bound. 

4.  There  should  be  some  compatibility  between  the  PPW 
of  a processing  node  and  the  BW  of  the  ICN.  As  shown  in  the 
study  (see  Figure  8.5),  under  the  condition  of  a MCQM  type 
of  query  mix,  a q_rate  of  100  queries/sec,  and  a BW  ratio  of 
1:10,  the  increase  of  PPW  results  in  a significant  increase 
of  performance  up  until  PPW=10  to  15.  Beyond  which  the 
increase  in  PPW  results  in  limited  increase  in  throughput.  A 
compatable  BW  with  that  PPW  is  therefore  1 Mbytes/sec  for 
cube  and  mesh  and  10  Mbytes/sec  for  bus. 

5.  The  general  comments  on  the  relationships  of  system 
throughput  with  the  BW  of  an  ICN,  query  mix,  and  query 
arrival  rate  areas  follows: 

(a)  The  higher  is  the  percentage  of  complex  query  in  a query 
mix,  the  more  sensitive  is  the  system  performance  to  the 
BW  of  ICN,  and  the  use  of  the  cube  ICN  is  more 
justified. 

(b)  The  lower  is  the  percentage  of  complex  query  in  a query 
mix,  the  higher  is  the  overall  throughput  for  any  BW. 
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(c)  For  high  complexity  query  mix  with  low  query  arrival 
rate,  the  type  of  ICNs  used  is  not  much  of  a factor. 

(d)  For  high  complexity  query  mix  with  high  query  arrival 
rate,  cube  is  better  than  the  other  two. 

(e)  For  medium  complexity  query  mix,  bus  is  significantly 
better  than  cube  and  mesh  at  high  bandwidth. 

(f)  In  all  three  query  mixes,  cube  is  always  better  than 
mesh.  Bus  is  the  worst  within  a relatively  wide  range 
of  BW  when  the  percentage  of  complex  query  in  query  mix 
is  high,  and  is  the  best  when  the  percentage  of  complex 
query  in  query  mix  is  low. 

6.  Finally,  although  the  cube  ICN  is  an  intuitively 
attractive  ICN  for  database  machine  study,  it  is  shown  in 
our  study  that  for  the  OFC  architecture,  under  the 
conditions  of  our  benchmark,  the  simpler  and  less  expensive 
bus  structure  is  adequate  for  phase  one  interconnection 
and,  under  some  conditions,  also  for  phase  two 
interconnection.  The  use  of  the  cube  ICN  is  justified  for 
phase  two  of  OFC  under  the  conditions  that  the  dollar  cost 
is  not  a major  consideration  in  the  machine  design  or  the 
OFC  is  used  in  a HCQM  with  a high  query  arrival  rate 
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(b)  The  type  of  query  mix  is  MCQM  (i.e.,  40%  type  1, 
20%  type  III,  and  20%  type  IV  queries). 
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(c)  The  type  of  query  mix  is  HCQM  (i.e.,  25%  type  1, 
25%  type  III,  and  25%  type  IV  queries). 
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(d)  The  overall  throughput  for  all  9 combinations  of  the  ICN1  and  ICN2. 
The  type  of  query  mix  is  HCQM. 


Figure  8.1.  The  system  throughput  of  the  first  and  the 

second  phases  measured  under  different  query 
mixes  and  different  communication  bandwidths 
Parameter  values  are  q_rate=100  queries/sec, 
BW  ratio=l:10,  and  PPW=10. 
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(a)  BW_ratio=1:1 

Figure  8.2.  Throughput  measured  under  a MCQM  type  of  query 
mix  and  a q_rate  of  100  queries/sec.  The  first 
phase  architecture  is  a bus  structure. 
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Figure  8 . 2 ( continued) . 
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Figure  8.3.  Throughput  measured  under  a HCQM  type  of 
query  mix.  The  BW_ration  is  1:10,  PPW  is 
10,  and  the  first  phase  architecture  is  a 
bus  structure. 
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(b)  q_rate=50  queries/sec. 
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(c)  q_rate=10  queries/sec. 


Figure  8 . 3 (continued) . 
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Throughput  measured  under  a LCQM  type 
query  mix.  The  BW_ratio  is  1:10,  PPW 
10,  q_rate  is  100  queries/sec.  Bus 
structure  is  used  in  the  first  phase. 


B_BW 

(Mbytes/sec) 

of 

is 


Figure  8.4. 
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Figure  8.5.  Throughput  measured  under  MCQM,  a BW_ratio  of 
1:10,  and  a q_rate  of  100  queries/sec.  The 
first  phase  architecture  is  a bus  structure. 
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CHAPTER  9 
CONCLUSION 

In  this  dissertation,  we  presented  the  design  and 
evaluation  of  an  Object  Flow  Computer  and  a novel  two-phase 
parallel  query  processing  strategy  used  in  the  OFC. 

f erent  from  most  of  the  past  works  on  database  machines 
which  support  a primitive  relational  model,  the  OFC  is 
designed  to  directly  support  a high-level,  object-oriented 
semantic  data  model.  The  OFC  integrates  many  existing 
software  and  hardware  techniques  to  achieve  high  efficiency 
in  processing  large  databases.  These  techniques  include 
query  decomposition,  pipelined  and  data-flow  query 
processing  techniques,  and  special  hardware  for  supporting 
primitive  database  operations. 

Features  of  the  OFC  and  two-phase  PQPS  can  be 
summarized  as  follows.  The  horizontal  and  vertical 
concurrency  is  increased  during  query  processing.  The 
number  of  join  operations  required  in  query  processing  is 
decreased.  Unnecessary  data  transfer  is  minimized.  The  OFC 
and  the  two-phase  PQPS  are  especially  suitable  for  complex 
query  processing  on  a large  amount  of  data.  Finally,  the 
OFC  supports  the  distribution  of  application  codes  in 
addition  to  data. 
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In  general,  the  OFC  architecture  is  a multiprocessor- 
based  share-nothing  architecture,  in  which  data  with  the 
associated  methods,  constraints,  and  rules  are  distributed 
and  stored  privately  in  each  processing  node  (PN)  . 
Di-fferent  from  the  designs  of  most  other  database  machines 
in  which  an  ICN  (e.g.,  hypercube)  structure  is  simply 
chosen,  the  ICN  structure  of  the  OFC  is  chosen  based  on 
extensive  analysis  and  simulation.  Three  ICNs, 
representing  three  different  degrees  of  connectivity,  are 
evaluated  in  the  architecture  design.  A simulation  package, 
called  DACPES,  is  written  to  simulate  query  processing 
within  the  OFC.  Queries  are  generated  based  on  the 
Wisconsin  Benchmark.  The  result  is  described  in  detail  in 
Chapter  8.  In  brief,  the  result  shows  that  the  processing 
bottleneck  is  in  the  second  phase.  Hence,  a simpler  and 
cheaper  bus  ICN  is  adequate  in  the  first  phase.  And,  which 
ICN  should  be  used  in  the  second  phase  will  mainly  depend  on 
the  type  of  query  mix  and  the  BW_ratio.  The  use  of  cube  ICN 
for  the  second  phase  is  justified  under  the  conditions  that 


dollar  cost  is  not  a major  consideration  in  the  machine 
design  or  the  OFC  is  used  in  a HCQM  with  a high  query 
arrival  rate  environment. 

Some  future  research  directions  for  the  OFC  and 
processing  of  objects  in  general  are  outlined  below. 


154 


(1)  In  this  dissertation,  we  have  explored  data 
processing  techniques  in  detail.  However,  those  techniques 
are  essentially  based  on  conventional  ones  for  relational 
database  operations.  In  an  object-oriented  semantic  model 
like  the  OSAM* , relationships  among  objects  can  be 
explicitly  defined  and  stored  as  "links".  New  techniques 
for  accessing  data  using  the  link  information  need  to  be 
studied.  For  example  in  [GU089],  a link  algebra  is  proposed 
for  object  manipulation  in  the  OSAM*.  Direct  hardware 
support  for  this  high-level  data  manipulation  mechanism  (the 
link  algebra)  should  be  investigated. 

(2)  In  the  OFC,  object  classes  are  distributed  among 
the  processing  nodes,  associated  with  the  constraints, 
deductive  rules,  and  application  codes  (which  can  be  system— 
or  user-defined  methods)  . This  can  facilitate  the 
processing  of  constraints,  rules,  and  application  codes  in  a 
distributed  and  parallel  fashion,  in  addition  to  the 
distributed  and  parallel  processing  of  data.  An  integrated 
mechanism  to  support  this  distributed  approach  to  processing 
data,  rules,  and  methods  uniformly  need  to  be  studied. 

(3)  Implementation  of  the  OFC  and  the  two-phase  PQPS 
is  another  task  for  the  verification  of  our  ideas.  Possible 
approaches  include  (a)  using  pure  software  for  the 
implementation  of  the  OFC  as  concurrent  processes  within  a 
UNIX  environment,  (b)  implementation  of  the  OFC  using 
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commercially  available  technology  (e.g.,  the  connection 
machine,  transputers,  etc.);  (c)  implementation  of  the  OFC 
using  commercially  available  technology  as  in  (b)  and  custom 
VLSI  special  function  co-processors,  and  (d)  a phased-in 
approach  using  all  of  the  above.  Relevant  issues  to  the 
implementation  include  detailed  design  of  a PN  to  meet  the 
requirement  (i.e.,  PPW)  identified  in  the  dissertation,  the 
allocation  of  data  of  object  classes  onto  processing  nodes, 
efficisnt  indexing/clustering  scheme  for  storing  and 
retrieving  data,  concurrency  control,  security,  and  recovery 
within  the  OFC  and  two-phase  PQPS  environment. 
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